Parallel Task and Subtasks workflow - c#

I'm new to C# threads and tasks and I'm trying to develop a workflow but without success probably because I'm mixing tasks with for iterations...
The point is:
I've got a bunch of lists, and inside each one there are some things to do, and need to make them work as much parallel and less blocking possible, and as soon as each subBunchOfThingsTodo is done ( it means every thing to do inside it is done parallely) it has do some business(DoSomethingAfterEveryThingToDoOfThisSubBunchOfThingsAreDone()).
e.g:
bunchOfSubBunchsOfThingsTodo
subBunchOfThingsTodo
ThingToDo1
ThingToDo2
subBunchOfThingsTodo
ThingToDo1
ThingToDo2
ThingToDo3
subBunchOfThingsTodo
ThingToDo1
ThingToDo2...
This is how I'm trying but unfortunately each iteration waits the previous one bunchOfThingsToDo and I need them to work in parallel.
The same happens to the things to do , they wait the previous thing to start...
List<X> bunchOfSubBunchsOfThingsTodo = getBunchOfSubBunchsOfThingsTodo();
foreach (var subBunchOfThingsToDo in bunchOfSubBunchsOfThingsTodo)
{
int idSubBunchOfThingsToDo = subBunchOfThingsToDo.ThingsToDo.FirstOrDefault().IdSubBunchOfThingsToDo;
var parent = Task.Factory.StartNew(() =>
{
foreach (var thingToDo in subBunchOfThingsToDo.ThingsToDo)
{
var child = Task.Factory.StartNew(() =>
{
//Do some stuff with thingToDo... Here I call several business methods
});
}
});
parent.Wait();
DoSomethingAfterEveryThingToDoOfThisSubBunchOfThingsAreDone(idSubBunchOfThingsToDo);
}

You may want to try using Task.WhenAll and playing with linq to generate a collection of hot tasks:
static async void ProcessThingsToDo(IEnumerable<ThingToDo> bunchOfThingsToDo)
{
IEnumerable<Task> GetSubTasks(ThingToDo thing)
=> thing.SubBunchOfThingsToDo.Select( async subThing => await Task.Run(subThing));
var tasks = bunchOfThingsToDo
.Select(async thing => await Task.WhenAll(GetSubTasks(thing)));
await Task.WhenAll(tasks);
}
This way you are running each subThingToDo on a separate task and you get only one Task composed by all subtasks for each thingToDo
EDIT
ThingToDo is a rather simple class in this sample:
class ThingToDo
{
public IEnumerable<Action> SubBunchOfThingsToDo { get; }
}

With minimum changes of your code you can try this way:
var toWait = new List<Task>();
List<X> bunchOfSubBunchsOfThingsTodo = getBunchOfSubBunchsOfThingsTodo();
foreach (var subBunchOfThingsToDo in bunchOfSubBunchsOfThingsTodo)
{
int idSubBunchOfThingsToDo = subBunchOfThingsToDo.ThingsToDo.FirstOrDefault().IdSubBunchOfThingsToDo;
var parent = Task.Factory.StartNew(() =>
{
Parallel.ForEach(subBunchOfThingsToDo.ThingsToDo,
thingToDo =>
{
//Do some stuff with thingToDo... Here I call several business methods
});
});
//parent.Wait();
var handle = parent.ContinueWith((x) =>
{
DoSomethingAfterEveryThingToDoOfThisSubBunchOfThingsAreDone(idSubBunchOfThingsToDo);
})
.Start();
toWait.Add(handle);
}
Task.WhenAll(toWait);
Thanks to downvoters team, that advised 'good' solution:
var bunchOfSubBunchsOfThingsTodo = getBunchOfSubBunchsOfThingsTodo();
var toWait = bunchOfSubBunchsOfThingsTodo
.Select(subBunchOfThingsToDo =>
{
return Task.Run(() =>
{
int idSubBunchOfThingsToDo = subBunchOfThingsToDo.ThingsToDo.FirstOrDefault().IdSubBunchOfThingsToDo;
Parallel.ForEach(subBunchOfThingsToDo.ThingsToDo,
thingToDo =>
{
//Do some stuff with thingToDo... Here I call several business methods
});
DoSomethingAfterEveryThingToDoOfThisSubBunchOfThingsAreDone(idSubBunchOfThingsToDo);
});
});
Task.WhenAll(toWait);

Related

Running same task multiple times in parallel with EF Core

I have a Task that generates a PDF file for an order (it takes about 10 seconds to create one PDF):
public async Task GeneratePDF(Guid Id) {
var order = await
_context
.Orders
.Include(order => order.Customer)
... //a lot more Include and ThenInclude statements
.FirstOrDefaultAsync(order ==> order.Id == Id);
var document = ... //PDF generated here, takes about 10 seconds
order.PDF = document ;
_context.SaveChangesAsync();
}
I tried the following:
public async Task GenerateAllPDFs() {
var orderIds = await _context.Orders.Select(order=> order.Id).ToListAsync();
foreach (var id in orderIds)
{
_ = GeneratePDF(id).ContinueWith(t => Console.WriteLine(t.Exception), TaskContinuationOptions.OnlyOnFaulted);
}
}
this gives me the error:
System.ObjectDisposedException: Cannot access a disposed object. A common cause of this error is disposing a context that was resolved from dependency injection and then later trying to use the same context instance elsewhere in your application. This may occur if you are calling Dispose() on the context, or wrapping the context in a using statement. If you are using dependency injection, you should let the dependency injection container take care of disposing context instances.
If I change the task as follows...
public async Task GenerateAllPDFs() {
var orderIds = await _context.Orders.Select(order=> order.Id).ToListAsync();
foreach (var id in orderIds)
{
_ = await GeneratePDF(id);
}
}
...it runs the task for each order in series, taking ages to complete (I have a few thousands orders, taking about 10 seconds per order)...
How can I run this task in parallel for all orders in the context, so that the time it takes to complete is much less than sequential processing?
You can map your order IDs to tasks and await them all like:
public async Task GeneratePDF(Order order) {
var document = ... //PDF generated here, takes about 10 seconds
order.PDF = document ;
}
public async Task GenerateAllPDFs() {
var orderIds = await _context.Orders.ToListAsync();
var tasks = orderIds.Select((order) => GeneratePDF(order).ContinueWith(t => Console.WriteLine(t.Exception), TaskContinuationOptions.OnlyOnFaulted));
await Task.WhenAll(tasks);
await _context.SaveChangesAsync();
}
Here is my suggestion from the comment as an answer. I would split it in 3 parts:
1) get all orders,
2) then do a Parallel.Foreach to generate all documents in parallel. and assign each document to the proper order and in the end
3) do a single _context.SaveChangesAsync(); to make a bulk update on the data on the server
public async Task GenerateAllPDFs()
{
var allOrders = await _context.Orders.ToListAsync();
System.Threading.Tasks.Parallel.ForEach(allOrders, order =>
{
var document = ... //PDF generated here, takes about 10 seconds
order.PDF = document ;
});
await _context.SaveChangesAsync();
}
You need to implement parallel programing.
https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/task-based-asynchronous-programming
public class Example
{
public static void Main()
{
Task[] taskArray = new Task[10];
for (int i = 0; i < taskArray.Length; i++) {
taskArray[i] = Task.Factory.StartNew( (Object obj ) => {
CustomData data = obj as CustomData;
if (data == null)
return;
data.ThreadNum = Thread.CurrentThread.ManagedThreadId;
},
new CustomData() {Name = i, CreationTime = DateTime.Now.Ticks} );
}
Task.WaitAll(taskArray);
foreach (var task in taskArray) {
var data = task.AsyncState as CustomData;
if (data != null)
Console.WriteLine("Task #{0} created at {1}, ran on thread #{2}.",
data.Name, data.CreationTime, data.ThreadNum);
}
}
}
I think I will have to "duplicate" the GeneratePDF method to facilitate the batch processing by implementing the other answers , since I need this method also in non-batch mode...

Creating awaitable tasks using LINQ

I want to create a collection of awaitable tasks, so that I can start them together and asynchronously process the result from each one as they complete.
I have this code, and a compilation error:
> cannot assign void to an implicitly-typed variable
If I understand well, the tasks return by Select don't have a return type, even though the delegate passed returns ColetaIsisViewModel, I would think:
public MainViewModel()
{
Task.Run(LoadItems);
}
async Task LoadItems()
{
IEnumerable<Task> tasks = Directory.GetDirectories(somePath)
.Select(dir => new Task(() =>
new ItemViewModel(new ItemSerializer().Deserialize(dir))));
foreach (var task in tasks)
{
var result = await task; // <-- here I get the compilation error
DoSomething(result);
}
}
You shouldn't ever use the Task constructor.
Since you're calling synchronous code (Deserialize), you could use Task.Run:
async Task LoadItems()
{
var tasks = Directory.GetDirectories(somePath)
.Select(dir => Task.Run(() =>
new ItemViewModel(new ItemSerializer().Deserialize(dir))));
foreach (var task in tasks)
{
var result = await task;
DoSomething(result);
}
}
Alternatively, you could use Parallel or Parallel LINQ:
void LoadItems()
{
var vms = Directory.GetDirectories(somePath)
.AsParallel().Select(dir =>
new ItemViewModel(new ItemSerializer().Deserialize(dir)))
.ToList();
foreach (var vm in vms)
{
DoSomething(vm);
}
}
Or, if you make Deserialize a truly async method, then you can make it all asynchronous:
async Task LoadItems()
{
var tasks = Directory.GetDirectories(somePath)
.Select(async dir =>
new ItemViewModel(await new ItemSerializer().DeserializeAsync(dir))));
foreach (var task in tasks)
{
var result = await task;
DoSomething(result);
}
}
Also, I recommend that you do not use fire-and-forget in your constructor. There are better patterns for asynchronous constructors.
I know the question has been answered, but you can always do this too:
var serializer = new ItemSerializer();
var directories = Directory.GetDirectories(somePath);
foreach (string directory in directories)
{
await Task.Run(() => serializer.Deserialize(directory))
.ContinueWith(priorTask => DoSomething(priorTask.Result));
}
Notice I pulled out the serializer instantiation (assuming there are no side effects).

Parallelize tasks using polly

Let's say I have a list of objects myObjs of type List<Foo>.
I have a polly policy :
var policy = Policy.Handle<Exception>().RetryForever();
which i want to run methods on in parralel, but keep retrying each as they fail.
for (int i = 0; i < myObjs.Count; i++)
{
var obj = myObjs[i];
policy.Execute(() => Task.Factory.StartNew(() => obj.Do(), TaskCreationOptions.LongRunning));
}
Will this be called in parallel and retry each obj? So if myObjs[5].do() fails, will only that get retried while other objects just get executed once?
Also, am I supposed to use the ExecuteAsync() method which accepts Func<Task> instead of the Execute(Action) one as shown in the example? Do() is just a synchronous method, being launched in a separate thread.
Actual code looks like this where each() is just a foreach wrapper()
_consumers.ForEach(c => policy.Execute(() => Task.Factory.StartNew(() => c.Consume(startFromBeg), TaskCreationOptions.LongRunning)));
EDIT:
I tried the code:
class Foo
{
private int _i;
public Foo(int i)
{
_i = i;
}
public void Do()
{
//var rnd = new Random();
if (_i==2)
{
Console.WriteLine("err"+_i);
throw new Exception();
}
Console.WriteLine(_i);
}
}
var policy = Policy.Handle<Exception>().Retry(3);
var foos=Enumerable.Range(0, 5).Select(x => new Foo(x)).ToList();
foos.ForEach(c => policy.Execute(() => Task.Factory.StartNew(() => c.Do(), TaskCreationOptions.LongRunning)));
but am getting result:
0 1 err2 3 4 5
I thought it would retry 2 a few more times, but doesn't. Any idea why?
Whatever owns the tasks must wait for them somehow. Otherwise, exceptions will be ignored and the code will end before the tasks actually complete. So yes, you should probably be using policy.ExecuteAsync() instead. It would look something like this:
var tasks = myObjs
.Select(obj => Task.Factory.StartNew(() => obj.Do(), TaskCreationOptions.LongRunning))
.ToList();
// sometime later
await Task.WhenAll(tasks);

Nested asynchronous tasks

I would like to know if it's possible to improve this code for better performance. I'm new to the whole async thing on the serverside, so please bear with me here:
con.GetGame(id, game => {
foreach(Player p in game.Team1)
{
p.SomeExtraDetails = GetPlayerDetails(p.Id);
}
// I would like the player data to be set on all players
// before ending up here
});
private PlayerDetails GetPlayerDetails(double playerId)
{
var task = con.GetPlayer(playerId);
PlayerDetails ret = null;
Task continuation = task.ContinueWith(t =>
{
ret = t.Result;
});
continuation.Wait();
return ret;
}
If I got it right, continuation.Wait(); blocks the main thread.
Is there any way to make the tasks run simultaneously?
Ideally, you'd make these operations asynchronous all the way down:
private Task<PlayerDetails> GetPlayerDetailsAsync(double playerId)
{
return con.GetPlayer(playerId);
}
con.GetGame(id, game => {
var tasks = game.Team1
.Select(p => new { Player=p, Details=GetPlayerDetailsAsync(p.Id)})
.ToList(); // Force all tasks to start...
foreach(var t in tasks)
{
t.Player.SomeExtraDetails = await t.Details;
}
// all player data is now set on all players
});
If that isn't an option (ie: you're not using VS 2012), you could simplify your code to:
// This is a more efficient version of your existing code
private PlayerDetails GetPlayerDetails(double playerId)
{
var task = con.GetPlayer(playerId);
return task.Result;
}
con.GetGame(id, game => {
// This will run all at once, but block until they're done
Parallel.ForEach(game.Team1, p =>
{
p.SomeExtraDetails = GetPlayerDetails(p.Id);
});
});
consider using Parallel.ForEach in your GetGame page instead of Task.ContinueWith
Alternative solution without LINQ (although I like Reed Copsey's solution). However, beware that, as pointed out in the comments, this solution introduces an overhead by encapsulating the call to GetPlayerDetailsAsync() inside Tasks created by Task.Run().
Requires .NET 4.5 and C# 5.
con.GetGame(id, game => {
var tasks = new List<Task>();
foreach(Player p in game.Team1)
{
tasks.Add(Task.Run(async () => p.SomeExtraDetails = await GetPlayerDetailsAsync(p.Id)));
}
Task.WaitAll(tasks.ToArray());
});
private Task<PlayerDetails> GetPlayerDetailsAsync(double playerId)
{
return con.GetPlayerAsync(playerId);
});
Further, in order to catch up on the Task-based Asynchronous Pattern (TAP) with .NET 4.5 I highly recommend reading: Task-based Asynchronous Pattern - by Stephen Toub, Microsoft.

Proper way to chain Tasks

I want to chain Tasks, then start the chain in parallel.
This snippet is just to illustrate my question:
var taskOrig = new Task(() => { });
var task = taskOrig;
foreach (var msg in messages)
{
task=task.ContinueWith(t => Console.WriteLine(msg));
}
taskOrig.Start();
Everything works fine except a little perfectionist inside me doesn't like having empty method executed first () => { }.
Is there any way to avoid it?
I do understand It's barely affecting performance (unless you do it really often), but still. Performance matters in my case, so checking if task exists in every iteration is not the way to do it.
You could do this:
Task task = Task.FromResult<object>(null);
foreach (var msg in messages)
{
task = task.ContinueWith(t => Console.WriteLine(msg));
}
The previous solution won't work in 4.0. In 4.0 you'd need to do the following instead:
var tcs = new TaskCompletionSource<object>();
Task task = tcs.Task;
foreach (var msg in messages)
{
task = task.ContinueWith(t => Console.WriteLine(msg));
}
tcs.SetResult(null);
(You can move SetResult to before the foreach loop if you prefer.)
Technically it's not the same as the continuations will start executing while you're still adding more. That's unlikely to be a problem though.
Another option would be to use something like this:
public static Task ForEachAsync<T>(IEnumerable<T> items, Action<T> action)
{
return Task.Factory.StartNew(() =>
{
foreach (T item in items)
{
action(item);
}
});
}
An example usage would be:
ForEachAsync(messages, msg => Console.WriteLine(msg));
One way to do that, is to create task in loop if it is null, but the code you provide looks better for me:
Task task = null;
foreach (var msg in messages)
{
if (task == null)
task = new Task(() => Console.WriteLine(msg))
else
task = task.ContinueWith(t => Console.WriteLine(msg));
}
task.Start();
Perhaps this:
if(messages.Length > 0)
{
Task task = new Task(t => Console.WriteLine(messages[0]));
for(int i = 1; i < messages.Length; i++)
{
task = task.ContinueWith(t => Console.WriteLine(messages[i]));
}
task.Start();
}

Categories

Resources