Parallelize tasks using polly

Parallelize tasks using polly - c#

Let's say I have a list of objects myObjs of type List<Foo>.
I have a polly policy :
var policy = Policy.Handle<Exception>().RetryForever();
which i want to run methods on in parralel, but keep retrying each as they fail.
for (int i = 0; i < myObjs.Count; i++)
{
var obj = myObjs[i];
policy.Execute(() => Task.Factory.StartNew(() => obj.Do(), TaskCreationOptions.LongRunning));
}
Will this be called in parallel and retry each obj? So if myObjs[5].do() fails, will only that get retried while other objects just get executed once?
Also, am I supposed to use the ExecuteAsync() method which accepts Func<Task> instead of the Execute(Action) one as shown in the example? Do() is just a synchronous method, being launched in a separate thread.
Actual code looks like this where each() is just a foreach wrapper()
_consumers.ForEach(c => policy.Execute(() => Task.Factory.StartNew(() => c.Consume(startFromBeg), TaskCreationOptions.LongRunning)));
EDIT:
I tried the code:
class Foo
{
private int _i;
public Foo(int i)
{
_i = i;
}
public void Do()
{
//var rnd = new Random();
if (_i==2)
{
Console.WriteLine("err"+_i);
throw new Exception();
}
Console.WriteLine(_i);
}
}
var policy = Policy.Handle<Exception>().Retry(3);
var foos=Enumerable.Range(0, 5).Select(x => new Foo(x)).ToList();
foos.ForEach(c => policy.Execute(() => Task.Factory.StartNew(() => c.Do(), TaskCreationOptions.LongRunning)));
but am getting result:
0 1 err2 3 4 5
I thought it would retry 2 a few more times, but doesn't. Any idea why?

Whatever owns the tasks must wait for them somehow. Otherwise, exceptions will be ignored and the code will end before the tasks actually complete. So yes, you should probably be using policy.ExecuteAsync() instead. It would look something like this:
var tasks = myObjs
.Select(obj => Task.Factory.StartNew(() => obj.Do(), TaskCreationOptions.LongRunning))
.ToList();
// sometime later
await Task.WhenAll(tasks);

Related

Parallel Task and Subtasks workflow

I'm new to C# threads and tasks and I'm trying to develop a workflow but without success probably because I'm mixing tasks with for iterations...
The point is:
I've got a bunch of lists, and inside each one there are some things to do, and need to make them work as much parallel and less blocking possible, and as soon as each subBunchOfThingsTodo is done ( it means every thing to do inside it is done parallely) it has do some business(DoSomethingAfterEveryThingToDoOfThisSubBunchOfThingsAreDone()).
e.g:
bunchOfSubBunchsOfThingsTodo
subBunchOfThingsTodo
ThingToDo1
ThingToDo2
subBunchOfThingsTodo
ThingToDo1
ThingToDo2
ThingToDo3
subBunchOfThingsTodo
ThingToDo1
ThingToDo2...
This is how I'm trying but unfortunately each iteration waits the previous one bunchOfThingsToDo and I need them to work in parallel.
The same happens to the things to do , they wait the previous thing to start...
List<X> bunchOfSubBunchsOfThingsTodo = getBunchOfSubBunchsOfThingsTodo();
foreach (var subBunchOfThingsToDo in bunchOfSubBunchsOfThingsTodo)
{
int idSubBunchOfThingsToDo = subBunchOfThingsToDo.ThingsToDo.FirstOrDefault().IdSubBunchOfThingsToDo;
var parent = Task.Factory.StartNew(() =>
{
foreach (var thingToDo in subBunchOfThingsToDo.ThingsToDo)
{
var child = Task.Factory.StartNew(() =>
{
//Do some stuff with thingToDo... Here I call several business methods
});
}
});
parent.Wait();
DoSomethingAfterEveryThingToDoOfThisSubBunchOfThingsAreDone(idSubBunchOfThingsToDo);
}

You may want to try using Task.WhenAll and playing with linq to generate a collection of hot tasks:
static async void ProcessThingsToDo(IEnumerable<ThingToDo> bunchOfThingsToDo)
{
IEnumerable<Task> GetSubTasks(ThingToDo thing)
=> thing.SubBunchOfThingsToDo.Select( async subThing => await Task.Run(subThing));
var tasks = bunchOfThingsToDo
.Select(async thing => await Task.WhenAll(GetSubTasks(thing)));
await Task.WhenAll(tasks);
}
This way you are running each subThingToDo on a separate task and you get only one Task composed by all subtasks for each thingToDo
EDIT
ThingToDo is a rather simple class in this sample:
class ThingToDo
{
public IEnumerable<Action> SubBunchOfThingsToDo { get; }
}

With minimum changes of your code you can try this way:
var toWait = new List<Task>();
List<X> bunchOfSubBunchsOfThingsTodo = getBunchOfSubBunchsOfThingsTodo();
foreach (var subBunchOfThingsToDo in bunchOfSubBunchsOfThingsTodo)
{
int idSubBunchOfThingsToDo = subBunchOfThingsToDo.ThingsToDo.FirstOrDefault().IdSubBunchOfThingsToDo;
var parent = Task.Factory.StartNew(() =>
{
Parallel.ForEach(subBunchOfThingsToDo.ThingsToDo,
thingToDo =>
{
//Do some stuff with thingToDo... Here I call several business methods
});
});
//parent.Wait();
var handle = parent.ContinueWith((x) =>
{
DoSomethingAfterEveryThingToDoOfThisSubBunchOfThingsAreDone(idSubBunchOfThingsToDo);
})
.Start();
toWait.Add(handle);
}
Task.WhenAll(toWait);
Thanks to downvoters team, that advised 'good' solution:
var bunchOfSubBunchsOfThingsTodo = getBunchOfSubBunchsOfThingsTodo();
var toWait = bunchOfSubBunchsOfThingsTodo
.Select(subBunchOfThingsToDo =>
{
return Task.Run(() =>
{
int idSubBunchOfThingsToDo = subBunchOfThingsToDo.ThingsToDo.FirstOrDefault().IdSubBunchOfThingsToDo;
Parallel.ForEach(subBunchOfThingsToDo.ThingsToDo,
thingToDo =>
{
//Do some stuff with thingToDo... Here I call several business methods
});
DoSomethingAfterEveryThingToDoOfThisSubBunchOfThingsAreDone(idSubBunchOfThingsToDo);
});
});
Task.WhenAll(toWait);

Correct way to link Tasks together when return values are needed at different times

I hope this makes sense - Suppose I have the following code:
Task.Run(() =>
{
return Task.WhenAll
(
Task1,
Task2,
...
Taskn
)
.ContinueWith(tsks=>
{
TaskA (uses output from Tasks Task1 & Task2, say)
}
, ct)
.ContinueWith(res =>
{
TaskB (uses output from TaskA and Task3, say)
}
, ct);
});
So I want all my first N tasks to run concurrently (since we have no interdependencies), then only once they're all finished, to continue with a task that relies on their outputs (I get that for this, I can use the tsks.Result).
BUT THEN I want to continue with a task that relies on one of the first tasks and the result of TaskA.
I'm a bit lost how to structure my code correctly so I can access the results of my first set of tasks outside of the immediately proceeding ContinueWith.
My one thought was to assign return value to them within my method - Something like:
... declare variables outside of Tasks ...
Task.Run(() =>
{
return Task.WhenAll
(
Task.Run(() => { var1 = Task1.Result; }, ct),
...
Task.Run(() => { varn = Taskn.Result; }, ct),
)
.ContinueWith(tsks=>
{
TaskA (uses output from Tasks var1 & varn, say)
}
, ct)
.ContinueWith(res =>
{
TaskB (uses output from TaskA and var3, say)
}
, ct);
});
But, even though this works for me, I have no doubt that that is doing it wrong.
What is the correct way? Should I have a state object that contains all the necessary variables and pass that throughout all my tasks? Is there a better way in total?
Please forgive my ignorance here - I'm just VERY new to concurrency programming.

Since Task1, Task2, ... , TaskN are in scope for the call of WhenAll, and because by the time ContinueWith passes control to your next task all the earlier tasks are guaranteed to finish, it is safe to use TaskX.Result inside the code implementing continuations:
.ContinueWith(tsks=>
{
var resTask1 = Task1.Result;
...
}
, ct)
You are guaranteed to get the result without blocking, because the task Task1 has finished running.

Here is a way to do it with ConcurrentDictionary, which sounds like it might be applicable in your use case. Also, since you're new to concurrency, it shows you the Interlocked class as well:
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Executing...");
var numOfTasks = 50;
var tasks = new List<Task>();
for (int i = 0; i < numOfTasks; i++)
{
var iTask = Task.Run(() =>
{
var counter = Interlocked.Increment(ref _Counter);
Console.WriteLine(counter);
if (counter == numOfTasks - 1)
{
Console.WriteLine("Waiting {0} ms", 5000);
Task.Delay(5000).Wait(); // to simulate a longish running task
}
_State.AddOrUpdate(counter, "Updated Yo!", (k, v) =>
{
throw new InvalidOperationException("This shouldn't occure more than once.");
});
});
tasks.Add(iTask);
}
Task.WhenAll(tasks)
.ContinueWith(t =>
{
var longishState = _State[numOfTasks - 1];
Console.WriteLine(longishState);
Console.WriteLine("Complete. longishState: " + longishState);
});
Console.ReadKey();
}
static int _Counter = -1;
static ConcurrentDictionary<int, string> _State = new ConcurrentDictionary<int, string>();
}
You get output similar to this (though it the Waiting line won't always be last before the continuation):

An elegant way to solve this is to use Barrier class.
Like this:
var nrOfTasks = ... ;
ConcurrentDictionary<int, ResultType> Results = new ConcurrentDictionary<int, ResultType>();
var barrier = new Barrier(nrOfTasks, (b) =>
{
// here goes the work of TaskA
// and immediatley
// here goes the work of TaskB, having the results of TaskA and any other task you might need
});
Task.Run(() => { Results[1] = Task1.Result; barrier.SignalAndWait(); }, ct),
...
Task.Run(() => { Results[nrOfTasks] = Taskn.Result; barrier.SignalAndWait(); }, ct

Unit tests failing with Observable.FromAsync and Observable.Switch

I'm having troubles testing a class that makes use of Observable.FromAsync<T>() and Observable.Switch<T>(). What it does is to wait for a trigger observable to produce a value, then it starts an async operation, and finally recollects all operations' results in a single output sequence. The gist of it is something like:
var outputStream = triggerStream
.Select(_ => Observable
.FromAsync(token => taskProducer.DoSomethingAsync(token)))
.Switch();
I put up some sanity check tests with the bare minimum parts to understand what's going on, here's the test with results in comments:
class test_with_rx : nspec
{
void Given_async_task_and_switch()
{
Subject<Unit> triggerStream = null;
TaskCompletionSource<long> taskDriver = null;
ITestableObserver<long> testObserver = null;
IDisposable subscription = null;
before = () =>
{
TestScheduler scheduler = new TestScheduler();
testObserver = scheduler.CreateObserver<long>();
triggerStream = new Subject<Unit>();
taskDriver = new TaskCompletionSource<long>();
// build stream under test
IObservable<long> streamUnderTest = triggerStream
.Select(_ => Observable
.FromAsync(token => taskDriver.Task))
.Switch();
/* Also tried with this Switch() overload
IObservable<long> streamUnderTest = triggerStream
.Select(_ => taskDriver.Task)
.Switch(); */
subscription = streamUnderTest.Subscribe(testObserver);
};
context["Before trigger"] = () =>
{
it["Should not notify"] = () => testObserver.Messages.Count.Should().Be(0);
// PASSED
};
context["After trigger"] = () =>
{
before = () => triggerStream.OnNext(Unit.Default);
context["When task completes"] = () =>
{
long result = -1;
before = () =>
{
taskDriver.SetResult(result);
//taskDriver.Task.Wait(); // tried with this too
};
it["Should notify once"] = () => testObserver.Messages.Count.Should().Be(1);
// FAILED: expected 1, actual 0
it["Should notify task result"] = () => testObserver.Messages[0].Value.Value.Should().Be(result);
// FAILED: of course, index out of bound
};
};
after = () =>
{
taskDriver.TrySetCanceled();
taskDriver.Task.Dispose();
subscription.Dispose();
};
}
}
In other tests I've done with mocks too, I can see that the Func passed to FromAsync is actually invoked (e.g. taskProducer.DoSomethingAsync(token)), but then it looks like nothing more follows, and the output stream doesn't produce the value.
I also tried inserting some Task.Delay(x).Wait(), or some taskDriver.Task.Wait() before hitting expectations, but with no luck.
I read this SO thread and I'm aware of schedulers, but at a first look I thought I didn't need them, no ObserveOn() is being used. Was I wrong? What am I missing? TA
Just for completeness, testing framework is NSpec, assertion library is FluentAssertions.

What you're hitting is a case of testing Rx and TPL together.
An exhaustive explanation can be found here but I'll try to give advice for your particular code.
Basically your code is working fine, but your test is not.
Observable.FromAsync will transform into a ContinueWith on the provided task, which will be executed on the taskpool, hence asynchronously.
Many ways to fix your test: (from ugly to complex)
Sleep after result set (note wait doesn't work because Wait doesn't wait for continuations)
taskDriver.SetResult(result);
Thread.Sleep(50);
Set the result before executing FromAsync (because FromAsync will return an immediate IObservable if the task is finished, aka will skip ContinueWith)
taskDriver.SetResult(result);
triggerStream.OnNext(Unit.Default);
Replace FromAsync by a testable alternative, e.g
public static IObservable<T> ToObservable<T>(Task<T> task, TaskScheduler scheduler)
{
if (task.IsCompleted)
{
return task.ToObservable();
}
else
{
AsyncSubject<T> asyncSubject = new AsyncSubject<T>();
task.ContinueWith(t => task.ToObservable().Subscribe(asyncSubject), scheduler);
return asyncSubject.AsObservable<T>();
}
}
(using either a synchronous TaskScheduler, or a testable one)

Handle tasks which complete after Task.WhenAll().Wait() specified timeout

I am trying to use Task.WhenAll(tasks).Wait(timeout) to wait for tasks to complete and after that process task results.
Consider this example:
var tasks = new List<Task<Foo>>();
tasks.Add(Task.Run(() => GetData1()));
tasks.Add(Task.Run(() => GetData2()));
Task.WhenAll(tasks).Wait(TimeSpan.FromSeconds(5));
var completedTasks = tasks
.Where(t => t.Status == TaskStatus.RanToCompletion)
.Select(t => t.Result)
.ToList();
// Process completed tasks
// ...
private Foo GetData1()
{
Thread.Sleep(TimeSpan.FromSeconds(4));
return new Foo();
}
private Foo GetData2()
{
Thread.Sleep(TimeSpan.FromSeconds(10));
// How can I get the result of this task once it completes?
return new Foo();
}
It is possible that one of these tasks will not complete their execution within 5 second timeout.
Is it possible to somehow process results of the tasks that have completed after specified timeout? Maybe I am not using right approach in this situation?
EDIT:
I am trying to get all task results that managed to complete within specified timeout. There could be the following outcomes after Task.WhenAll(tasks).Wait(TimeSpan.FromSeconds(5)):
First task completes within 5 seconds.
Second task completes within 5 seconds.
Both tasks complete within 5 seconds.
None of the tasks complete within 5 seconds. Is it possible to get task results that haven't completed within 5 seconds, but have completed later, lets say, after 10 seconds?

In the end with help of the user who removed his answer, I ended up with this solution:
private const int TimeoutInSeconds = 5;
private static void Main(string[] args)
{
var tasks = new List<Task>()
{
Task.Run( async() => await Task.Delay(30)),
Task.Run( async() => await Task.Delay(300)),
Task.Run( async() => await Task.Delay(6000)),
Task.Run( async() => await Task.Delay(8000))
};
Task.WhenAll(tasks).Wait(TimeSpan.FromSeconds(TimeoutInSeconds));
var completedTasks = tasks
.Where(t => t.Status == TaskStatus.RanToCompletion).ToList();
var incompleteTasks = tasks
.Where(t => t.Status != TaskStatus.RanToCompletion).ToList();
Task.WhenAll(incompleteTasks)
.ContinueWith(t => { ProcessDelayedTasks(incompleteTasks); });
ProcessCompletedTasks(completedTasks);
Console.ReadKey();
}
private static void ProcessCompletedTasks(IEnumerable<Task> delayedTasks)
{
Console.WriteLine("Processing completed tasks...");
}
private static void ProcessDelayedTasks(IEnumerable<Task> delayedTasks)
{
Console.WriteLine("Processing delayed tasks...");
}

Instead of Waitall, you probably just want to do some sort of Spin/sleep of 5 seconds and then query the list as you are above.
You should then be able to enumerate again after a few more seconds to see what else has finished.
If performance is a concern, you may want to have additional 'wrapping' to see if All tasks have completed before 5 seconds.

I think there's a possible loss of task items between
var completedTasks = tasks.Where(t => t.Status == TaskStatus.RanToCompletion).ToList();
and
var incompleteTasks = tasks.Where(t => t.Status != TaskStatus.RanToCompletion).ToList();
because some tasks may ran to completition during this time.
As a workaround (not correct though) you coud swap these lines. In this case some tasks may present in each (completedTasks and incompleteTasks) list. But maybe it's better than to be lost completely.
A unit test to compare number of started tasks and number of tasks in completedTasks and incompleteTasks lists may also be useful.

Parallel ForEach wait 500 ms before spawning

I have this situation:
var tasks = new List<ITask> ...
Parallel.ForEach(tasks, currentTask => currentTask.Execute() );
Is it possible to instruct PLinq to wait for 500ms before the next thread is spawned?
System.Threading.Thread.Sleep(5000);

You are using Parallel.Foreach totally wrong, You should make a special Enumerator that rate limits itself to getting data once every 500 ms.
I made some assumptions on how your DTO works due to you not providing any details.
private IEnumerator<SomeResource> GetRateLimitedResource()
{
SomeResource someResource = null;
do
{
someResource = _remoteProvider.GetData();
if(someResource != null)
{
yield return someResource;
Thread.Sleep(500);
}
} while (someResource != null);
}
here is how your paralell should look then
Parallel.ForEach(GetRateLimitedResource(), SomeFunctionToProcessSomeResource);

There are already some good suggestions. I would agree with others that you are using PLINQ in a manner it wasn't meant to be used.
My suggestion would be to use System.Threading.Timer. This is probably better than writing a method that returns an IEnumerable<> that forces a half second delay, because you may not need to wait the full half second, depending on how much time has passed since your last API call.
With the timer, it will invoke a delegate that you've provided it at the interval you specify, so even if the first task isn't done, a half second later it will invoke your delegate on another thread, so there won't be any extra waiting.
From your example code, it sounds like you have a list of tasks, in this case, I would use System.Collections.Concurrent.ConcurrentQueue to keep track of the tasks. Once the queue is empty, turn off the timer.

You could use Enumerable.Aggregate instead.
var task = tasks.Aggregate((t1, t2) =>
t1.ContinueWith(async _ =>
{ Thread.Sleep(500); return t2.Result; }));
If you don't want the tasks chained then there is also the overload to Select assuming the tasks are in order of delay.
var tasks = Enumerable
.Range(1, 10)
.Select(x => Task.Run(() => x * 2))
.Select((x, i) => Task.Delay(TimeSpan.FromMilliseconds(i * 500))
.ContinueWith(_ => x.Result));
foreach(var result in tasks.Select(x => x.Result))
{
Console.WriteLine(result);
}
From the comments a better options would be to guard the resource instead of using the time delay.
static object Locker = new object();
static int GetResultFromResource(int arg)
{
lock(Locker)
{
Thread.Sleep(500);
return arg * 2;
}
}
var tasks = Enumerable
.Range(1, 10)
.Select(x => Task.Run(() => GetResultFromResource(x)));
foreach(var result in tasks.Select(x => x.Result))
{
Console.WriteLine(result);
}

In this case how about a Producer-Consumer pattern with a BlockingCollection<T>?
var tasks = new BlockingCollection<ITask>();
// add tasks, if this is an expensive process, put it out onto a Task
// tasks.Add(x);
// we're done producin' (allows GetConsumingEnumerable to finish)
tasks.CompleteAdding();
RunTasks(tasks);
With a single consumer thread:
static void RunTasks(BlockingCollection<ITask> tasks)
{
foreach (var task in tasks.GetConsumingEnumerable())
{
task.Execute();
// this may not be as accurate as you would like
Thread.Sleep(500);
}
}
If you have access to .Net 4.5 you can use Task.Delay:
static void RunTasks(BlockingCollection<ITask> tasks)
{
foreach (var task in tasks.GetConsumingEnumerable())
{
Task.Delay(500)
.ContinueWith(() => task.Execute())
.Wait();
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parallelize tasks using polly - c#

Related

Parallel Task and Subtasks workflow

Correct way to link Tasks together when return values are needed at different times

Unit tests failing with Observable.FromAsync and Observable.Switch

Handle tasks which complete after Task.WhenAll().Wait() specified timeout

Parallel ForEach wait 500 ms before spawning

Categories

Resources