I've been using Parallel.ForEach to do some time-consuming processing on collections of items. The processing is actually handled by an external command line tool and I cannot change that. However, it seems that the Parallel.ForEach will get "stuck" on a long running item from the collection. I've distilled the problem down and can show that Parallel.ForEach is, in fact, waiting for this long one to finish and not allowing any others through. I've written a console app to demonstrate the problem:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace testParallel
{
class Program
{
static int inloop = 0;
static int completed = 0;
static void Main(string[] args)
{
// initialize an array integers to hold the wait duration (in milliseconds)
var items = Enumerable.Repeat(10, 1000).ToArray();
// set one of the items to 10 seconds
items[50] = 10000;
// Initialize our line for reporting status
Console.Write(0.ToString("000") + " Threads, " + 0.ToString("000") + " completed");
// Start the loop in a task (to avoid SO answers having to do with the Parallel.ForEach call, itself, not being parallel)
var t = Task.Factory.StartNew(() => Process(items));
// Wait for the operations to compelte
t.Wait();
// Report finished
Console.WriteLine("\nDone!");
}
static void Process(int[] items)
{
// SpinWait (not sleep or yield or anything) for the specified duration
Parallel.ForEach(items, (msToWait) =>
{
// increment the counter for how many threads are in the loop right now
System.Threading.Interlocked.Increment(ref inloop);
// determine at what time we shoule stop spinning
var e = DateTime.Now + new TimeSpan(0, 0, 0, 0, msToWait);
// spin until the target time
while (DateTime.Now < e) /* no body -- just a hard loop */;
// count another completed
System.Threading.Interlocked.Increment(ref completed);
// we're done with this iteration
System.Threading.Interlocked.Decrement(ref inloop);
// report status
Console.Write("\r" + inloop.ToString("000") + " Threads, " + completed.ToString("000") + " completed");
});
}
}
}
Basically, I make an array of int to store the number of milliseconds a given operation takes. I set them all to 10 except for one, which I set to 10000 (so, 10 seconds). I kick off the Parallel.ForEach in a task and process each integer in a hard spin wait (so it shouldn't be yielding or sleeping or anything).
On each iteration, I report how many iterations are in the body of the loop right now, and how many iterations we have completed. Mostly, it goes along fine. However, toward the end (time-wise), it reports "001 Threads, 987 Completed".
My question is why doesn't it use 7 of the other cores to work on the remaining 13 "jobs"? This one long-running iteration should not keep it from processing other elements in the collection, right?
This example happens to be a fixed collection, but it could easily be set to be an enumerable. We wouldn't want to stop fetching the next item in the enumerable just because one was taking a long time.
I found the answer (or at least, an answer). It has to do with the chunk partitioning. The SO answer here got it for me. So basically, at the top of my "Process" function, if I change from this:
static void Process(int[] items)
{
Parallel.ForEach(items, (msToWait) => { ... });
}
to this
static void Process(int[] items)
{
var partitioner = Partitioner.Create(items, EnumerablePartitionerOptions.NoBuffering);
Parallel.ForEach(partitioner, (msToWait) => { ... });
}
it grabs the work one at a time. For the more typical case of a parallel for each, where the body doesn't take more than a second, I can certainly see chunking the sets of work. In my use case, however, each body part can take anywhere from half a second to 5 hours. I certainly would not want a bunch of the 10-second variety elements to be blocked by one 5 hour element. So, in this case, the overhead of "one-at-a-time" is well worth it.
Related
When I run below code, Output is this:
When I run till 300, output is this:
When I run till 100, output is this:
Does this mean that both methods started almost at the same time?
If this is true, why do we need Parallel library if we can achieve parallelism by async-await?
using System;
using System.Threading.Tasks;
class Program
{
public static void PrintX()
{
for (int i = 0; i < 500; i++) { Console.Write("x"); }
}
public static void PrintY()
{
for (int i = 0; i < 500; i++) { Console.Write("y"); }
}
public async Task RunAsync()
{
var t1 = Task.Run(() => PrintY());
var t2 = Task.Run(() => PrintX());
await t1;
await t2;
}
static void Main(string[] args)
{
Task t = new Program().RunAsync();
t.Wait();
}
}
Ultimately you're at the mercy of the thread pool here. You have enqueued two items (Task.Run), and they will be picked up and serviced at some future time. When they start is non-deterministic, and will depend on how many available threads there are, and other factors.
They will start approximately at the same time, with no guarantees of anything (perhaps not even the order in which they start). The await will be triggered against their completion - so when you call await (or even whether you call await) won't impact them in any way. They might run in parallel, but most likely they individually run fast enough that whichever one gets started first will have completed before it tries starting the second. They might even end up running consecutively on the same thread (outputting the managed thread id would be a way to see this).
As for why we need Parallel: firstly, it pre-dates async/await by a long time; secondly it does a lot of things to allow larger scale parallelization - things like running a large sequence with concurrent processing including fixed maximum parallelization.
Just to show that it can be concurrent, here's the output from a real run where I added the Environment.CurrentManagedThreadId into the output:
main: 1
y: 3
x: 4
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
definitely concurrent, but: other runs can show very different outputs
Using a variable delay in Task.Delay randomly takes seconds instead of milliseconds when combined with a IO-like operation.
Code to reproduce:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace ConsoleApplication {
class Program {
static void Main(string[] args) {
Task[] wait = {
new delayTest().looper(5250, 20),
new delayTest().looper(3500, 30),
new delayTest().looper(2625, 40),
new delayTest().looper(2100, 50)
};
Task.WaitAll(wait);
Console.WriteLine("All Done");
Console.ReadLine();
}
}
class delayTest {
private Stopwatch sw = new Stopwatch();
public delayTest() {
sw.Start();
}
public async Task looper(int count, int delay) {
var start = sw.Elapsed;
Console.WriteLine("Start ({0}, {1})", count, delay);
for (int i = 0; i < count; i++) {
var before = sw.Elapsed;
var totalDelay = TimeSpan.FromMilliseconds(i * delay) + start;
double wait = (totalDelay - sw.Elapsed).TotalMilliseconds;
if (wait > 0) {
await Task.Delay((int)wait);
SpinWait.SpinUntil(() => false, 1);
}
var finalDelay = (sw.Elapsed - before).TotalMilliseconds;
if (finalDelay > 30 + delay) {
Console.WriteLine("Slow ({0}, {1}): {4} Expected {2:0.0}ms got {3:0.0}ms", count, delay, wait, finalDelay, i);
}
}
Console.WriteLine("Done ({0}, {1})", count, delay);
}
}
}
Also reported this on connect.
Leaving old question bellow, for completeness.
I am running a task that reads from a network stream, then delays for 20ms, and reads again (doing 500 reads, this should take around 10 seconds). This works well when I only read with 1 task, but strange things happen when I have multiple tasks running, some with long (60 seconds) delay. My ms-delay tasks suddenly hang half way.
I am running the following code (simplified):
var sw = Stopwatch();
sw.Start()
await Task.Delay(20); // actually delay is 10, 20, 30 or 40;
if (sw.Elapsed.TotalSeconds > 1) {
Console.WriteLine("Sleep: {0:0.00}s", sw.Elapsed.TotalSeconds);
}
This prints:
Sleep: 11.87s
(Actually it gives the 20ms delay 99% of the time, those are ignored).
This delay is almost 600 times longer than expected. The same delay happens on 3 separate threads at the same time, and they all continue again at the same time also.
The 60 second sleeping task wakes up as normal ~40 seconds after the short tasks finish.
Half the time this problem does not even happen. The other half, it has a consistent delay of 11.5-12 seconds. I would suspect a scheduling or thread-pool problem, but all threads should be free.
When I pause my program during the stuck phase, the main thread stacktrace stands on Task.WaitAll, 3 tasks are Scheduled on await Task.Delay(20) and one task is Scheduled on await Task.Delay(60000). Also there are 4 more tasks Awaiting those first 4 tasks, reporting things like '"Task 24" is waiting on this object: "Task 5313" (Owned by thread 0)'. All 4 tasks say the waiting task is owned by thread 0. There are also 4 ContinueWith tasks that I think I can ignore.
There are some other things going on, like a second console application that writes to the network stream, but one console application should not affect the other.
I am completely clueless on this one. What is going on?
Update:
Based on comments and questions:
When I run my program 4 times, 2-3 times it will hang for 10-15 seconds, 1-2 times it will operate as normal (and wont print "Sleep: {0:0.00}s".)
Thread.Count indeed goes up, but this happens regardless of the hang. I just had a run where it did not hang, and Thread.Count started at 24, wend up to 40 after 1 second, around 22 seconds the short tasks finished normal, and then Thread.Count wend down to 22 slowly over the next 40 seconds.
Some more code, full code is found in the link below. Starting clients:
List<Task> tasks = new List<Task>();
private void makeClient(int delay, int startDelay) {
Task task = new ClientConnection(this, delay, startDelay).connectAsync();
task.ContinueWith(_ => {
lock (tasks) { tasks.Remove(task); }
});
lock (tasks) { tasks.Add(task); }
}
private void start() {
DateTime start = DateTime.Now;
Console.WriteLine("Starting clients...");
int[] iList = new[] {
0,1,1,2,
10, 20, 30, 40};
foreach (int delay in iList) {
makeClient(delay, 0); ;
}
makeClient(15, 40);
Console.WriteLine("Done making");
tasks.Add(displayThreads());
waitForTasks(tasks);
Console.WriteLine("All done.");
}
private static void waitForTasks(List<Task> tasks) {
Task[] waitFor;
lock (tasks) {
waitFor = tasks.ToArray();
}
Task.WaitAll(waitFor);
}
Also, I tried to replace the Delay(20) with await Task.Run(() => Thread.Sleep(20))
Thread.Count now goes from 29 to 43 and back down to 24, however among multiple runes it never hangs.
With or without ThreadPool.SetMinThreads(500, 500), using TaskExt.Delay by noserati it does not hang. (That said, even switching over 1 line of code sometimes stops it from hanging, only to randomly continue after I restart the project 4 times, but I've tried this 6 times in a row without any problems now).
I've tried everything above with and without ThreadPool.SetMinThreads so far, never made any difference.
Update2: CODE!
Without seeing more code, it's hard to make futher guesses, but I'd like to summarize the comments, it may help someone else in the future:
We've figured out that the ThreadPool stuttering is not an issues here, as ThreadPool.SetMinThreads(500, 500) didn't help.
Is there any SynchronizationContext in place anywhere in your task workflow? Place Debug.Assert(SyncrhonizationContext.Current == null) everywhere to check for that. Use ConfigureAwait(false) with every await.
Is there any .Wait, .WaitOne, .WaitAll, WaitAny, .Result used anywhere in your code? Any lock () { ... } constructs? Monitor.Enter/Exit or any other blocking synchronization primitives?
Regarding this: I've already replaced Task.Delay(20) with Task.Yield(); Thread.Sleep(20) as a workaround, that works. But yeah, I continue to try to figure out what's going on here because the idea that Task.Delay(20) can shoot this far out of line makes it totally unusable.
This sounds worrying, indeed. It's very unlikely there's a bug in Task.Delay, but everything is possible. For the sake of experimenting, try replacing await Task.Delay(20) with await Task.Run(() => Thread.Sleep(20)), having ThreadPool.SetMinThreads(500, 500) still in-place.
I also have an experimental implementation of Delay which uses unamanaged CreateTimerQueueTimer API (unlike Task.Delay, which uses System.Threading.Timer, which in turn uses managed TimerQueue). It's available here as a gist. Feel free to try it as TaskExt.Delay instead of the standard Task.Delay. The timer callbacks are posted to ThreadPool, so ThreadPool.SetMinThreads(500, 500) still should be used for this experiment. I doubt it could make any difference, but I'd be interested to know.
I am using a Parallel.For loop to increase execution speed of a computation.
I would like to measure the approximate time left for the computation. Normally one simply has to measure the time it takes for each step and estimate the total time by multiplying the step time by the total number of steps.
e.g., If there are 100 steps and some step takes 5 seconds then one could except that the total time would be about 500 seconds. (one could average over several steps and continuously report to the user which is what I want to do).
The only way I can think to do this is by using an outer for loop that essentially resorts back to the original way by splitting up the parallel.for interval and measuring each one.
for(i;n;i += step)
Time(Parallel.For(i, i + step - 1, ...))
This isn't a very good way in general because either a few number of very long steps or a large number of short steps cause problems with timing.
Anyone have any ideas?
(Please realize I need a real time estimation of the time it is taking the parallel.for to complete and NOT the total time. I want to let the user know how much time is left in execution).
This method seems to be pretty effective. We can "linearize" the parallel for loop by simply having each parallel loop increment a counter:
Parallel.For(0, n, (i) => { Thread.Sleep(1000); Interlocked.Increment(ref cnt); });
(Note, thanks to Niclas, that ++ is not atomic and one must use lock or Interlocked.Increment)
Each loop, running in parallel, will increment cnt. The effect is that cnt is monotonically increasing to n, and cnt/n is the percentage of how much the for is complete. Since there is no contention for cnt, there are no concurrency issues and it is very fast and very perfectly accurate.
We can measure the percentage of completion of the parallel For loop at any time during the execution by simply computing cnt/n
The total computation time can be easily estimated by dividing the elapsed time since the start of the loop with the percentage the loop is at. These two quantities should have approximately the same rates of change when each loop takes approximately the same amount of time is relatively well behaved (can average out small fluctuation too).
Obviously the more unpredictable each task is, the more inaccurate the remaining computation time will be. This is to be expected and in general, there is no solution (which is why it's called an approximation). We can still get the elapsed computation time or percentage with complete accuracy.
The underlying assumption of any estimation of "time left" algorithms is each sub task takes approximately the same computation time (assuming one wants a linear result). For example, if we have a parallel approach where 99 tasks are very quick and 1 task is very slow, our estimation will be grossly inaccurate. Our counter will zip up to 99 pretty quick then sit on the last percentage until the slow task completes. We could linearly interpolate and do further estimation to get a smoother countdown but ultimately there is a breaking point.
The following code demonstrates how to measure the parallel for efficiently. Note the time at 100% is the true total execution time and can be used as a reference.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using System.Diagnostics;
namespace ParallelForTiming
{
class Program
{
static void Main(string[] args)
{
var sw = new Stopwatch();
var pct = 0.000001;
var iter = 20;
var time = 20 * 1000 / iter;
var p = new ParallelOptions(); p.MaxDegreeOfParallelism = 4;
var Done = false;
Parallel.Invoke(() =>
{
sw.Start();
Parallel.For(0, iter, p, (i) => { Thread.Sleep(time); lock(p) { pct += 1 / (double)iter; }});
sw.Stop();
Done = true;
}, () =>
{
while (!Done)
{
Console.WriteLine(Math.Round(pct*100,2) + " : " + ((pct < 0.1) ? "oo" : (sw.ElapsedMilliseconds / pct /1000.0).ToString()));
Thread.Sleep(2000);
}
}
);
Console.WriteLine(Math.Round(pct * 100, 2) + " : " + sw.ElapsedMilliseconds / pct / 1000.0);
Console.ReadKey();
}
}
}
This is almost impossible to answer.
First of all, it's not clear what all the steps do. Some steps may be I/O-intensive, or computationally intensive.
Furthermore, Parallel.For is a request -- you are not sure that your code will actually run in parallel. It depends on circumstances (availability of threads and memory) whether the code will actually run in parallel. Then if you have parallel code that relies on I/O, one thread will block the others while waiting for the I/O to complete. And you don't know what other processes are doing either.
This is what makes predicting how long something will take extremely error-prone and, actually, an exercise in futility.
This problem is a tough one to answer. The problems with timing that you refer to using very long steps or a large number of very short steps are likley related to that your loop will be working at the edges of what the parallel partitioner can handle.
Since the default partitioner is very dynamic and we know nothing about your actual problem there is no good answer that allows you to solve the problem at hand while still reaping the benefits of parallel execution with dynamic load balancing.
If it is very important to achive a reliable estimation of projected runtime perhaps you could set up a custom partitioner and then leverage your knowledge about the partioning to extrapolate timings from a few chunks on one thread.
Here's a possible solution to measure the average of all previously finished tasks. After each task finishes, an Action<T> is called where you could summarize all times and divide it by the total tasks finished. This is however just the current state and has no way to predict any future tasks / averages. (As others mentioned, this is quite difficult)
However: You'll have to measure if it fits for your problem because there is a possibility for lock contention on both the method level declared variables.
static void ComputeParallelForWithTLS()
{
var collection = new List<int>() { 1000, 2000, 3000, 4000 }; // values used as sleep parameter
var sync = new object();
TimeSpan averageTime = new TimeSpan();
int amountOfItemsDone = 0; // referenced by the TPL, increment it with lock / interlocked.increment
Parallel.For(0, collection.Count,
() => new TimeSpan(),
(i, loopState, tlData) =>
{
var sw = Stopwatch.StartNew();
DoWork(collection, i);
sw.Stop();
return sw.Elapsed;
},
threadLocalData => // Called each time a task finishes
{
lock (sync)
{
averageTime += threadLocalData; // add time used for this task to the total.
}
Interlocked.Increment(ref amountOfItemsDone); // increment the tasks done
Console.WriteLine(averageTime.TotalMilliseconds / amountOfItemsDone + ms.");
/*print out the average for all done tasks so far. For an estimation,
multiply with the remaining items.*/
});
}
static void DoWork(List<int> items, int current)
{
System.Threading.Thread.Sleep(items[current]);
}
I would propose having the method being executed at each step report when it is done. This is slightly tricky with thread safety of course, so that is something to remember when implementing. This will let you keep track of number of finished tasks out of the total, and also makes it (sort of) easy to know the time spent on each individual step, which is useful to remove outliers etc.
EDIT: Some code to demonstrate the idea
Parallel.For(startIdx, endIdx, idx => {
var sw = Stopwatch.StartNew();
DoCalculation(idx);
sw.Stop();
var dur = sw.Elapsed;
ReportFinished(idx, dur);
});
The key here is that ReportFinished will give you continuous information about number of finished tasks, and the duration of each of them. This enables you to do some better guesses about how long time remains by doing statistics on this data.
Here i wrote class that mesures time and speed
public static class Counter
{
private static long _seriesProcessedItems = 0;
private static long _totalProcessedItems = 0;
private static TimeSpan _totalTime = TimeSpan.Zero;
private static DateTime _operationStartTime;
private static object _lock = new object();
private static int _numberOfCurrentOperations = 0;
public static void StartAsyncOperation()
{
lock (_lock)
{
if (_numberOfCurrentOperations == 0)
{
_operationStartTime = DateTime.Now;
}
_numberOfCurrentOperations++;
}
}
public static void EndAsyncOperation(int itemsProcessed)
{
lock (_lock)
{
_numberOfCurrentOperations--;
if (_numberOfCurrentOperations < 0)
throw new InvalidOperationException("EndAsyncOperation without StartAsyncOperation");
_seriesProcessedItems +=itemsProcessed;
if (_numberOfCurrentOperations == 0)
{
_totalProcessedItems += _seriesProcessedItems;
_totalTime += DateTime.Now - _operationStartTime;
_seriesProcessedItems = 0;
}
}
}
public static double GetAvgSpeed()
{
if (_totalProcessedItems == 0) throw new InvalidOperationException("_totalProcessedItems is zero");
if (_totalProcessedItems == 0) throw new InvalidOperationException("_totalTime is zero");
return _totalProcessedItems / (double)_totalTime.TotalMilliseconds;
}
public static void Reset()
{
_totalProcessedItems = 0;
_totalTime = TimeSpan.Zero;
}
}
Example of usage and test:
static void Main(string[] args)
{
var st = Stopwatch.StartNew();
Parallel.For(0, 100, _ =>
{
Counter.StartAsyncOperation();
Thread.Sleep(100);
Counter.EndAsyncOperation(1);
});
st.Stop();
Console.WriteLine("Speed correct {0}", 100 / (double)st.ElapsedMilliseconds);
Console.WriteLine("Speed to test {0}", Counter.GetAvgSpeed());
}
I've got a problem. I'm writing a benchmark and I have a function than is either done in 2 seconds or after ~5 minutes(depending on the input data). And I would like to stop that function if it's executed for more than 3 seconds...
How can I do it?
Thanks a lot!
Well..., I had the same question, and after reading all the answers here and the referred blogs, I settled for this,
It Lets me execute any block of code with a time limit, Declare the wrapper method
public static bool ExecuteWithTimeLimit(TimeSpan timeSpan, Action codeBlock)
{
try
{
Task task = Task.Factory.StartNew(() => codeBlock());
task.Wait(timeSpan);
return task.IsCompleted;
}
catch (AggregateException ae)
{
throw ae.InnerExceptions[0];
}
}
And use that to wrap any block of code like this
// code here
bool Completed = ExecuteWithTimeLimit(TimeSpan.FromMilliseconds(1000), () =>
{
//
// Write your time bounded code here
//
});
//More code
The best way would be that your function can check its execution time often enough to decide to stop it it takes too long.
If this is not the case, then run the function in a separate thread. In your main thread start a 3 seconds timer. When timer elapses, kill the separate thread using Thread.Abort() (of course unless the function is already over). See sample code and preacuations of usage in the function docs.
The best way in C# to stop function in middle is the return keyword in function, but how do I know when to use the return keyword to stop the function in middle, after it lasts at least 3 seconds? The Stopwatch class from System.Diagnostics is the answer. This complicated function that lasts between 2 seconds to 5 minutes (depending on the input data) logically uses many loops, and maybe even recursion, so my solution for you is that, at the first line code of that function, create an instance of Stopwatch using System.Diagnostics with the new keyword, start it by calling the Start() function of the Stopwatch class, and in for each loop and loop, at the beginning, add the following code:
if (stopwatch.ElapsedMilliseconds >= 3000) {
stopwatch.Stop();
// or
stopwatch.Reset();
return;
}
(tip: you can type it with hands once, copy it Ctrl+C, and then just paste it Ctrl+V). If that function uses recursion, in order to save memory, make the Stopwatch global instance rather than creating it as local instance at first, and start it if it does not running at the beginning of the code. You can know that with the IsRunning of the Stopwatch class. After that ask if elapsed time is more than 3 seconds, and if yes (true) stop or reset the Stopwatch, and use the return keyword to stop the recursion loop, very good start in function, if your function lasts long time due mainly recursion more than loops. That it is. As you can see, it is very simple, and I tested this solution, and the results showed that it works! Try it yourself!
private static int LongRunningMethod()
{
var r = new Random();
var randomNumber = r.Next(1, 10);
var delayInMilliseconds = randomNumber * 1000;
Task.Delay(delayInMilliseconds).Wait();
return randomNumber;
}
And
var task = Task.Run(() =>
{
return LongRunningMethod();
});
bool isCompletedSuccessfully = task.Wait(TimeSpan.FromMilliseconds(3000));
if (isCompletedSuccessfully)
{
return task.Result;
}
else
{
throw new TimeoutException("The function has taken longer than the maximum time allowed.");
}
it work for me!
Source: https://jeremylindsayni.wordpress.com/2016/05/28/how-to-set-a-maximum-time-to-allow-a-c-function-to-run-for/
You can use the fork/join pattern, in the Task Parallel Library this is implemented with Task.WaitAll()
using System.Threading.Tasks;
void CutoffAfterThreeSeconds() {
// start function on seperate thread
CancellationTokenSource cts = new CancellationTokenSource();
Task loop = Task.Factory.StartNew(() => Loop(cts.Token));
// wait for max 3 seconds
if(Task.WaitAll(new Task[]{loop}, 3000)){
// Loop finished withion 3 seconds
} else {
// it did not finish within 3 seconds
cts.Cancel();
}
}
// this one takes forever
void Loop() {
while (!ct.IsCancellationRequested) {
// your loop goes here
}
Console.WriteLine("Got Cancelled");
}
This will start the other task on a seperate thread, and then wait for 3000 milliseconds for it to finish. If it did finish within the timeout, it return true, else false so you can use that to decide what to do next.
You can use a CancellationToken to communicate to the other thread that it result is no longer needed so it can stop gracefully.
Regards Gert-Jan
Run this function in thread and kill it after 3 seconds or check elapsed time inside this function(I think it's loop there).
Use an OS callbacks with a hi performance counter, then kill your thread, if exists
It is possible to execute a function in a separate thread and limit its execution with Thread.Join(millisecondsTimeout):
using System.Threading;
Thread workThread = new Thread(DoFunc);
workThread.Start(param);
if (!workThread.Join(3000))
{
// DoFunc() took longer than 3 seconds. Thread was aborted
}
private void DoFunc(object param)
{
// do some long work
}
Since C# and .net framework are not real-time environments, you can't guarantee even the 3 seconds count. Even if you were to get close to that, you would still have to call the
if(timeSpan > TimeSpan.FromSeconds(3) then goto endindentifier; before every other call in the method.
All this is just wrong so no, there is just no reliable way to do it from what I know.
Although you can try this solution
https://web.archive.org/web/20140222210133/http://kossovsky.net/index.php/2009/07/csharp-how-to-limit-method-execution-time
but I just wouldn't do such things in .net application.
I'm currently working on a project, where we have the challenge to process items in parallel. So far not a big deal ;)
Now to the problem. We have a list of IDs, where we periodically (every 2 sec's) what to call a StoredProcedure for each ID.
The 2 sec's need to be checked for each item individually, as they are added and removing during runtime.
In addition we want to configure the maximum degree of parallelism, as the DB should not be flooded with 300 threads concurrently.
An item which is being processed should not be rescheduled for processing until it has finished with the previous execution. Reason is that we want to prevent queueing up a lot of items, in case of delays on the DB.
Right now we are using a self-developed component, that has a main thread, which periodically checks what items need to scheduled for processing. Once it has the list, it's dropping those on a custom IOCP-based thread pool, and then uses waithandles to wait for the items being processed. Then the next iteration starts. IOCP because of the work-stealing it provides.
I would like to replace this custom implementation with a TPL/.NET 4 version, and I would like to know how you would solve it (ideally simple and nicely readable/maintainable).
I know about this article: http://msdn.microsoft.com/en-us/library/ee789351.aspx, but it's just limiting the amount of threads being used. Leaves work stealing, periodically executing the items ....
Ideally it will become a generic component, that can be used for some all the tasks that need to be done periodically for a list of items.
any input welcome,
tia
Martin
I don't think you actually need to get down and dirty with direct TPL Tasks for this. For starters I would set up a BlockingCollection around a ConcurrentQueue (the default) with no BoundedCapacity set on the BlockingCollection to store the IDs that need to be processed.
// Setup the blocking collection somewhere when your process starts up (OnStart for a Windows service)
BlockingCollection<string> idsToProcess = new BlockingCollection<string>();
From there I would just use Parallel::ForEach on the enumeration returned from the BlockingCollection::GetConsumingEnumerable. In the ForEach call you will setup your ParallelOptions::MaxDegreeOfParallelism Inside the body of the ForEach you will execute your stored procedure.
Now, once the stored procedure execution completes, you're saying you don't want to re-schedule the execution for at least two seconds. No problem, schedule a System.Threading.Timer with a callback which will simply add the ID back to the BlockingCollection in the supplied callback.
Parallel.ForEach(
idsToProcess.GetConsumingEnumerable(),
new ParallelOptions
{
MaxDegreeOfParallelism = 4 // read this from config
},
(id) =>
{
// ... execute sproc ...
// Need to declare/assign this before the delegate so that we can dispose of it inside
Timer timer = null;
timer = new Timer(
_ =>
{
// Add the id back to the collection so it will be processed again
idsToProcess.Add(id);
// Cleanup the timer
timer.Dispose();
},
null, // no state, id wee need is "captured" in the anonymous delegate
2000, // probably should read this from config
Timeout.Infinite);
}
Finally, when the process is shutting down you would call BlockingCollection::CompleteAdding so that the enumerable being processed with stop blocking and complete and the Parallel::ForEach will exit. If this were a Windows service for example you would do this in OnStop.
// When ready to shutdown you just signal you're done adding
idsToProcess.CompleteAdding();
Update
You raised a valid concern in your comment that you might be processing a large amount of IDs at any given point and fear that there would be too much overhead in a timer per ID. I would absolutely agree with that. So in the case that you are dealing with a large list of IDs concurrently, I would change from using a timer-per-ID to using another queue to hold the "sleeping" IDs which is monitored by a single short interval timer instead. First you'll need a ConcurrentQueue onto which to place the IDs that are asleep:
ConcurrentQueue<Tuple<string, DateTime>> sleepingIds = new ConcurrentQueue<Tuple<string, DateTime>>();
Now, I'm using a two-part Tuple here for illustration purposes, but you may want to create a more strongly typed struct for it (or at least alias it with a using statement) for better readability. The tuple has the id and a DateTime which represents when it was put on the queue.
Now you'll also want to setup the timer that will monitor this queue:
Timer wakeSleepingIdsTimer = new Timer(
_ =>
{
DateTime utcNow = DateTime.UtcNow;
// Pull all items from the sleeping queue that have been there for at least 2 seconds
foreach(string id in sleepingIds.TakeWhile(entry => (utcNow - entry.Item2).TotalSeconds >= 2))
{
// Add this id back to the processing queue
idsToProcess.Enqueue(id);
}
},
null, // no state
Timeout.Infinite, // no due time
100 // wake up every 100ms, probably should read this from config
);
Then you would simply change the Parallel::ForEach to do the following instead of setting up a timer for each one:
(id) =>
{
// ... execute sproc ...
sleepingIds.Enqueue(Tuple.Create(id, DateTime.UtcNow));
}
This is pretty similar to the approach you said you already had in your question, but does so with TPL tasks. A task just adds itself back to a list of things to schedule when its done.
The use of locking on a plain list is fairly ugly in this example, would probably want a better collection to hold the list of things to schedule
// Fill the idsToSchedule
for (int id = 0; id < 5; id++)
{
idsToSchedule.Add(Tuple.Create(DateTime.MinValue, id));
}
// LongRunning will tell TPL to create a new thread to run this on
Task.Factory.StartNew(SchedulingLoop, TaskCreationOptions.LongRunning);
That starts up the SchedulingLoop, which actually performs the checking if its been two seconds since something ran
// Tuple of the last time an id was processed and the id of the thing to schedule
static List<Tuple<DateTime, int>> idsToSchedule = new List<Tuple<DateTime, int>>();
static int currentlyProcessing = 0;
const int ProcessingLimit = 3;
// An event loop that performs the scheduling
public static void SchedulingLoop()
{
while (true)
{
lock (idsToSchedule)
{
DateTime currentTime = DateTime.Now;
for (int index = idsToSchedule.Count - 1; index >= 0; index--)
{
var scheduleItem = idsToSchedule[index];
var timeSincePreviousRun = (currentTime - scheduleItem.Item1).TotalSeconds;
// start it executing in a background task
if (timeSincePreviousRun > 2 && currentlyProcessing < ProcessingLimit)
{
Interlocked.Increment(ref currentlyProcessing);
Console.WriteLine("Scheduling {0} after {1} seconds", scheduleItem.Item2, timeSincePreviousRun);
// Schedule this task to be processed
Task.Factory.StartNew(() =>
{
Console.WriteLine("Executing {0}", scheduleItem.Item2);
// simulate the time taken to call this procedure
Thread.Sleep(new Random((int)DateTime.Now.Ticks).Next(0, 5000) + 500);
lock (idsToSchedule)
{
idsToSchedule.Add(Tuple.Create(DateTime.Now, scheduleItem.Item2));
}
Console.WriteLine("Done Executing {0}", scheduleItem.Item2);
Interlocked.Decrement(ref currentlyProcessing);
});
// remove this from the list of things to schedule
idsToSchedule.RemoveAt(index);
}
}
}
Thread.Sleep(100);
}
}