I am trying to understand what is the best possible way of dealing with a problem that this is and also how do I over come TaskWasCancelledException where cancelling of task seems a genuine use case.
So, scenario is: I have a Huge list of names, that I want to match with 3 different Regex. If we find a match in any one of the Regex, we add the name into a ConcurrentBag and stop matching it with other Regexes. So my basic algo for this is
Iterate through all names in parallel
Start three different tasks with Regex.IsMatch
Each of these three tasks have CancellationToken
As soon as there is a match, add the item into Bag and Cancel other tasks.
Wait for all tasks to finish.
My problem is in last step, where we are waiting for tasks to finish, WaitAll throws TaskWasCancelledException. Can you please tell me what is wrong with my approach and what else can be a better way of dealing with such scenario.
Also, i have to check for task != null which again I don't understand that why and when some of the tasks are being set to null.
Parallel.ForEach(accounts, p =>
{
var can1 = new CancellationTokenSource();
var can2 = new CancellationTokenSource();
var can3 = new CancellationTokenSource();
tasks.Add(Task.Run(() =>
{
if (reg1.IsMatch(p.DisplayName))
{
bag.Add(p);
can2.Cancel();
can3.Cancel();
}
}, can1.Token));
tasks.Add(Task.Run(() =>
{
if (reg2.IsMatch(p.DisplayName))
{
bag.Add(p);
can1.Cancel();
can3.Cancel();
}
}, can2.Token));
tasks.Add(Task.Run(() =>
{
if (reg3.IsMatch(p.DisplayName))
{
bag.Add(p);
can1.Cancel();
can2.Cancel();
}
}, can3.Token));
}
);
await Task.WhenAll(tasks.Where(t => t != null).ToArray());
Unless the regex is really long running there is no need to run the 3 regexes in parallel since your name list is already processed in parallel. So the best solution is to just delete the regex task code.
Simply call the regexes one after the other in sequential code and stop when a match was found. This is fastest and easiest at the same time.
If you want to keep this then note that the TPL cannot cancel your task code for you. The token that you pass is only checked once before running your task code. .NET code is not interruptible. So cancellation cannot possibly work here.
WhenAll throwing is indeed a problem that must be mitigated. You can say:
Task.WhenAll(myRegexTasks).ContinueWith(_ => { }).Wait();
This creates a proxy task whose sole purpose it is to not throw even if the base task throws. Unfortunately, the .NET Framework still has no clean way to wait for a Task to complete but not throw.
Related
I am playing and learning through async and parallel programming. I have a list of addresses and want to dns resolve them. Furthermore, I have made this function for that:
private static Task<string> ResolveAsync(string ipAddress)
{
return Task.Run(() => Dns.GetHostEntry(ipAddress).HostName);
}
Now, in the program I am resolving addresses like this, the idea is to use parallel programming:
//getting orderedClientIps before
var taskArray = new List<Task>();
foreach (var orderedClientIp in orderedClientIps)
{
var task = new Task(async () =>
{
orderedClientIp.Address = await ResolveAsync(orderedClientIp.Ip);
});
taskArray.Add(task);
task.Start();
}
Task.WaitAll(taskArray.ToArray());
foreach (var orderedClientIp in orderedClientIps)
{
Console.WriteLine($"{(orderedClientIp.Ip)} ({orderedClientIp.Ip}) - {orderedClientIp.Count}");
}
So, here we wait for all the addresses to resolve, and then in a separate iteration print them.
What interests me, what would be the difference if instead of printing in separate iteration, I would do something like this:
foreach (var orderedClientIp in orderedClientIps)
{
var task = new Task(async () =>
{
orderedClientIp.Address = await ResolveAsync(orderedClientIp.Ip);
Console.WriteLine($"{(orderedClientIp.Ip)} ({orderedClientIp.Ip}) - {orderedClientIp.Count}");
});
taskArray.Add(task);
task.Start();
}
Task.WaitAll(taskArray.ToArray());
I have tried executing, and it writes to console one by one, whereas in the first instance writes them all out after waiting them.
I think that the first approach is parallel and better, but am not quite sure of the differences. What is, in the context of async and parallel programming, different in the second approach? And, does the second approach somehow violates Task.WaitAll() line.
The difference in the output behaviour you see is simply related to the point in time where you write the output.
Second approach: "and it writes to console one by one"
That's because the code to write the output is called as soon as any task is "done". That happens at different point in time and thus you see them being output "one by one".
First approach: "in the first instance writes them all out after waiting them."
That's because you do just that in your code. Wait until all is done and then output sequentially what you have found.
Your example cannot be judged by the behaviour of the output regarding which version is better in running things parallel.
In fact, for all practical purposes they are identical. The overhead of Console.WriteLine inside the task (compared to doing the actual DNS-lookup) should be neglectable.
It would be different for compute intensive things, but then you should probably be using Parallel.ForEach anyway.
So where should you output then? It depends. If you need to show the information (here the DNS-lookup result) as soon as possible, then do it from inside the Task. If it can wait until all is done (which might take some time), then do it at the end.
The write to the console is not asyncron. because the console is per default not async. With the Console part you "syncronize" your tasks. Maybe:
var task = new Task(async () =>
{
orderedClientIp.Address = await ResolveAsync(orderedClientIp.Ip);
return $"{(orderedClientIp.Ip)} ({orderedClientIp.Ip}) - {orderedClientIp.Count}";
}).ContinueWith(previousTask => Console.WriteLine(previousTask.Result));
I have a class where each method execute asynchronously, i.e. return a Task, but where each method should nevertheless wait for the completion of the preceding call.
Continuation, right?
Except that a task continuation takes a delegate (Action) in parameter, not another task.
I've tried different things and the best I could do to make it work is the following (to me quite complex) code:
private Task QueueTask(Func<Task> futureTask)
{
var completionSource = new TaskCompletionSource<int>();
_lastTask.ContinueWith(async t =>
{
try
{
await futureTask();
completionSource.SetResult(0);
}
catch (Exception ex)
{
completionSource.SetException(ex);
}
});
_lastTask = completionSource.Task;
return _lastTask;
}
Here _lastTask is a private member of my class. Since all calls are coming from the UI thread, I just keep the last task and put continuation on it.
As I said I find this code quite convoluted. Do you have a better suggestion?
To me, it seems like you're asking the wrong question. A queue of tasks like this is an unusual requirement. We don't know anything about the actual problem you're trying to solve, so we can't suggest better approaches.
ContinueWith is intended for dynamic parallel processing, so it doesn't quite fit in with async code. However, you can use ContinueWith paired with Unwrap to sort-of emulate the behavior of await (if you ignore how await interacts with the current context).
So you can simplify your queue of tasks solution as such:
private Task QueueTask(Func<Task> futureTask)
{
_lastTask = _lastTask.ContinueWith(t => futureTask()).Unwrap();
return _lastTask;
}
However, there are probably better solutions. If the purpose of the queue is to provide exclusive access, a SemaphoreSlim would be more natural. If you actually do need a queue for some reason, consider using a Dataflow mesh.
I’m writing a win forms that uses the report viewer for the creation of multiple PDF files. These PDF files are divided in 4 main parts, each part is responsible for the creation of a specific report. These processes are creating a minimum of 1 file up to the number of users (currently 50).
The program already exists using there 4 methods sequentially. For extra performance where the number of users is growing, I want to separate these methods from the mail process in 4 separate threads.
While I'm new to multithreading using C# I read a number of articles how to achieve this. The only thing I'm not sure of is which way I should start. As I read multiple blog posts I'm not sure if to use 4 separate threads, a thread pool or multiple background workers. (or should parallel programming be the best way?). Blog posts tell me if more than 3 threads use a thread pool, but on the other hand the tell me if using winforms, use the backgroundworker. Which option is best (and why)?
At the end my main thread has to wait for all processes to end before continuing.
Can someone tell me what's the best solution to my problem.
* Extra information after edit *
Which i forgot to tell (after i read al your comments and possible solutions). The methods share one "IEnumerable" only for reading. After firing the methods (that don't have to run sequentially), the methods trigger events for for sending status updates to the UI. I think triggering events is difficult if not impossible using separate threads so there should be some kind of callback function to report status updates while running.
some example in psuedo code.
main()
{
private List<customclass> lcc = importCustomClass()
export.CreatePDFKind1.create(lcc.First(), exportfolderpath, arg1)
export.CreatePDFKind2.create(lcc, exportfolderpath)
export.CreatePDFKind3.create(lcc.First(), exportfolderpath)
export.CreatePDFKind4.create(customclass2, exportfolderpath)
}
namespace export
{
class CreatePDFKind1
{
create(customclass cc, string folderpath)
{
do something;
reportstatus(listviewItem, status, message)
}
}
class CreatePDFKind2
{
create(IEnumerable<customclass> lcc, string folderpath)
{
foreach (var x in lcc)
{
do something;
reportstatus(listviewItem, status, message)
}
}
}
etc.......
}
From the very basic picture you have described, I would use the Task Paralell Library (TPL). Shipped with .NET Framework 4.0+.
You talk about the 'best' option of using thread pools when spawning a large-to-medium number of threads. Dispite this being correct [the most efficent way of mangaing the resources], the TPL does all of this for you - without you having to worry about a thing. The TPL also makes the use of multiple threads and waiting on their completion a doddle too...
To do what you require I would use the TPL and Continuations. A continuation not only allows you to create a flow of tasks but also handles your exceptions. This is a great introduction to the TPL. But to give you some idea...
You can start a TPL task using
Task task = Task.Factory.StartNew(() =>
{
// Do some work here...
});
Now to start a second task when an antecedent task finishes (in error or successfully) you can use the ContinueWith method
Task task1 = Task.Factory.StartNew(() => Console.WriteLine("Antecedant Task"));
Task task2 = task1.ContinueWith(antTask => Console.WriteLine("Continuation..."));
So as soon as task1 completes, fails or is cancelled task2 'fires-up' and starts running. Note that if task1 had completed before reaching the second line of code task2 would be scheduled to execute immediately. The antTask argument passed to the second lambda is a reference to the antecedent task. See this link for more detailed examples...
You can also pass continuations results from the antecedent task
Task.Factory.StartNew<int>(() => 1)
.ContinueWith(antTask => antTask.Result * 4)
.ContinueWith(antTask => antTask.Result * 4)
.ContinueWith(antTask =>Console.WriteLine(antTask.Result * 4)); // Prints 64.
Note. Be sure to read up on exception handling in the first link provided as this can lead a newcomer to TPL astray.
One last thing to look at in particular for what you want is child tasks. Child tasks are those which are created as AttachedToParent. In this case the continuation will not run until all child tasks have completed
TaskCreationOptions atp = TaskCreationOptions.AttachedToParent;
Task.Factory.StartNew(() =>
{
Task.Factory.StartNew(() => { SomeMethod() }, atp);
Task.Factory.StartNew(() => { SomeOtherMethod() }, atp);
}).ContinueWith( cont => { Console.WriteLine("Finished!") });
So in your case you would start your four tasks, then wait on their completion on the main thread.
I hope this helps.
Using a BackgroundWorker is helpful if you need to interact with the UI with respect to your background process. If you don't, then I wouldn't bother with it. You can just start 4 Task objects directly:
tasks.Add(Task.Factory.StartNew(()=>DoStuff()));
tasks.Add(Task.Factory.StartNew(()=>DoStuff2()));
tasks.Add(Task.Factory.StartNew(()=>DoStuff3()));
If you do need to interact with the UI; possibly by updating it to reflect when the tasks are finished, then I would suggest staring one BackgroundWorker and then using tasks again to process each individual unit of work. Since there is some additional overhead in using a BackgroundWorker I would avoid starting lots of them if you can avoid it.
BackgroundWorker bgw = new BackgroundWorker();
bgw.DoWork += (_, args) =>
{
List<Task> tasks = new List<Task>();
tasks.Add(Task.Factory.StartNew(() => DoStuff()));
tasks.Add(Task.Factory.StartNew(() => DoStuff2()));
tasks.Add(Task.Factory.StartNew(() => DoStuff3()));
Task.WaitAll(tasks.ToArray());
};
bgw.RunWorkerCompleted += (_, args) => updateUI();
bgw.RunWorkerAsync();
You could of course use just Task methods to do all of this, but I still find BackgroundWorkers a bit simpler to work with for the simpler cases. Using .NEt 4.5 you could use Task.WhenAll to run a continuation in the UI thread when all 4 tasks finished, but doing that in 4.0 wouldn't be quite as simple.
Without further information it's impossible to tell. The fact that they're in four separate methods doesn't make much of a difference if they're accessing the same resources. The PDF file for example. If you're having trouble understanding what I mean you should post some of the code for each method and I'll go into a little more detail.
Since the number of "parts" you have is fixed it won't make a big difference whether you use separate threads, background workers or use a thread pool. I'm not sure why people are recommending background workers. Most likely because it's a simpler approach to multithreading and more difficult to screw up.
So I was told recently that how I was using my .ContinueWith for Tasks was not the proper way to use them. I have yet to find evidence of this on the internet so I will ask you guys and see what the answer is. Here is an example of how I use .ContinueWith:
public Task DoSomething()
{
return Task.Factory.StartNew(() =>
{
Console.WriteLine("Step 1");
})
.ContinueWith((prevTask) =>
{
Console.WriteLine("Step 2");
})
.ContinueWith((prevTask) =>
{
Console.WriteLine("Step 3");
});
}
Now I know this is a simple example and it will run very fast, but just assume each task does some longer operation. So, what I was told is that in the .ContinueWith, you need to say prevTask.Wait(); otherwise you could do work before the previous task finishes. Is that even possible? I assumed my second & third task would only run once their previous task finishes.
What I was told how to write the code:
public Task DoSomething()
{
return Task.Factory.StartNew(() =>
{
Console.WriteLine("Step 1");
})
.ContinueWith((prevTask) =>
{
prevTask.Wait();
Console.WriteLine("Step 2");
})
.ContinueWith((prevTask) =>
{
prevTask.Wait();
Console.WriteLine("Step 3");
});
}
Ehhh.... I think some of the current answers are missing something: what happens with exceptions?
The only reason you would call Wait in a continuation would be to observe a potential exception from the antecedent in the continuation itself. The same observation would happen if you accessed Result in the case of a Task<T> and also if you manually accessed the Exception property.
Frankly, I wouldn't call Wait or access Result because if there is an exception you'll pay the price of re-raising it which is unnecessary overhead. Instead you can just check the IsFaulted property off the antecedent Task. Alternatively you can create forked workflows by chaining on multiple sibling continuations that only fire based on either success or failure with TaskContinuationOptions.OnlyOnRanToCompletion and TaskContinuationOptions.OnlyOnFaulted.
Now, it's not necessary to observe the exception of the antecedent in the continuation, but you may not want your workflow to move forward if, say, "Step 1" failed. In that case: specifying TaskContinuationOptions.NotOnFaulted to your ContinueWith calls would prevent the continuation logic from ever even firing.
Keep in mind that, if your own continuations don't observe the exception, the person who is waiting on this overall workflow to complete is going to be the one to observe it. Either they're Waiting on the Task upstream or have tacked on their own continuation to know when it is complete. If it is the latter, their continuation would need to use the aforementioned observation logic.
You are using it correctly.
Creates a continuation that executes asynchronously when the target
Task completes.
Source: Task.ContinueWith Method (Action as MSDN)
Having to call prevTask.Wait() in every Task.ContinueWith invocation seems like a weird way to repeat unnecessary logic - i.e. doing something to be "super duper sure" because you actually don't understand what a certain bit of code does. Like checking for a null just to throw an ArgumentNullException where it would've been thrown anyway.
So, no, whoever told you that is wrong and probably doesn't understand why Task.ContinueWith exists.
Who told you that?
Quoting MSDN:
Creates a continuation that executes asynchronously when the target
Task completes.
Also, what would be the purpose of Continue With if it wasn't waiting for the previous task to be completed?
You can even test it by yourself:
Task.Factory.StartNew(() =>
{
Console.WriteLine("Step 1");
Thread.Sleep(2000);
})
.ContinueWith((prevTask) =>
{
Console.WriteLine("I waited step 1 to be completed!");
})
.ContinueWith((prevTask) =>
{
Console.WriteLine("Step 3");
});
From the MSDN on Task.Continuewith
The returned Task will not be scheduled for execution until the
current task has completed. If the criteria specified through the
continuationOptions parameter are not met, the continuation task will
be canceled instead of scheduled.
I think that the way you expect it to work in the first example is the correct way.
You might also want to consider using Task.Run instead of Task.Factory.StartNew.
Stephen Cleary's blog post and the Stephen Toub's post that he references explain the differences. There is also a discussion in this answer.
By Accessing Task.Result you are actually doing similar logic to task.wait
I will reiterate what many have spoken already, prevTask.Wait() is unnecessary.
For more examples one can go to Chaining Tasks using Continuation Tasks, yet another link by Microsoft with good examples.
I want to queue dependant tasks across several flows that need to be processed in order (in each flow). The flows can be processed in parallel.
To be specific, let's say I need two queues and I want the tasks in each queue to be processed in order. Here is sample pseudocode to illustrate the desired behavior:
Queue1_WorkItem wi1a=...;
enqueue wi1a;
... time passes ...
Queue1_WorkItem wi1b=...;
enqueue wi1b; // This must be processed after processing of item wi1a is complete
... time passes ...
Queue2_WorkItem wi2a=...;
enqueue wi2a; // This can be processed concurrently with the wi1a/wi1b
... time passes ...
Queue1_WorkItem wi1c=...;
enqueue wi1c; // This must be processed after processing of item wi1b is complete
Here is a diagram with arrows illustrating dependencies between work items:
The question is how do I do this using C# 4.0/.NET 4.0? Right now I have two worker threads, one per queue and I use a BlockingCollection<> for each queue. I would like to instead leverage the .NET thread pool and have worker threads process items concurrently (across flows), but serially within a flow. In other words I would like to be able to indicate that for example wi1b depends on completion of wi1a, without having to track completion and remember wi1a, when wi1b arrives. In other words, I just want to say, "I want to submit a work item for queue1, which is to be processed serially with other items I have already submitted for queue1, but possibly in parallel with work items submitted to other queues".
I hope this description made sense. If not please feel free to ask questions in the comments and I will update this question accordingly.
Thanks for reading.
Update:
To summarize "flawed" solutions so far, here are the solutions from the answers section that I cannot use and the reason(s) why I cannot use them:
TPL tasks require specifying the antecedent task for a ContinueWith(). I do not want to maintain knowledge of each queue's antecedent task when submitting a new task.
TDF ActionBlocks looked promising, but it would appear that items posted to an ActionBlock are processed in parallel. I need for the items for a particular queue to be processed serially.
Update 2:
RE: ActionBlocks
It would appear that setting the MaxDegreeOfParallelism option to one prevents parallel processing of work items submitted to a single ActionBlock. Therefore it seems that having an ActionBlock per queue solves my problem with the only disadvantage being that this requires the installation and deployment of the TDF library from Microsoft and I was hoping for a pure .NET 4.0 solution. So far, this is the candidate accepted answer, unless someone can figure out a way to do this with a pure .NET 4.0 solution that doesn't degenerate to a worker thread per queue (which I am already using).
I understand you have many queues and don't want to tie up threads. You could have an ActionBlock per queue. The ActionBlock automates most of what you need: It processes work items serially, and only starts a Task when work is pending. When no work is pending, no Task/Thread is blocked.
The best way is to use the Task Parallel Library (TPL) and Continuations. A continuation not only allows you to create a flow of tasks but also handles your exceptions. This is a great introduction to the TPL. But to give you some idea...
You can start a TPL task using
Task task = Task.Factory.StartNew(() =>
{
// Do some work here...
});
Now to start a second task when an antecedent task finishes (in error or successfully) you can use the ContinueWith method
Task task1 = Task.Factory.StartNew(() => Console.WriteLine("Antecedant Task"));
Task task2 = task1.ContinueWith(antTask => Console.WriteLine("Continuation..."));
So as soon as task1 completes, fails or is cancelled task2 'fires-up' and starts running. Note that if task1 had completed before reaching the second line of code task2 would be scheduled to execute immediately. The antTask argument passed to the second lambda is a reference to the antecedent task. See this link for more detailed examples...
You can also pass continuations results from the antecedent task
Task.Factory.StartNew<int>(() => 1)
.ContinueWith(antTask => antTask.Result * 4)
.ContinueWith(antTask => antTask.Result * 4)
.ContinueWith(antTask =>Console.WriteLine(antTask.Result * 4)); // Prints 64.
Note. Be sure to read up on exception handling in the first link provided as this can lead a newcomer to TPL astray.
One last thing to look at in particular for what you want is child tasks. Child tasks are those which are created as AttachedToParent. In this case the continuation will not run until all child tasks have completed
TaskCreationOptions atp = TaskCreationOptions.AttachedToParent;
Task.Factory.StartNew(() =>
{
Task.Factory.StartNew(() => { SomeMethod() }, atp);
Task.Factory.StartNew(() => { SomeOtherMethod() }, atp);
}).ContinueWith( cont => { Console.WriteLine("Finished!") });
I hope this helps.
Edit: Have you had a look at ConcurrentCollections in particular the BlockngCollection<T>. So in your case you might use something like
public class TaskQueue : IDisposable
{
BlockingCollection<Action> taskX = new BlockingCollection<Action>();
public TaskQueue(int taskCount)
{
// Create and start new Task for each consumer.
for (int i = 0; i < taskCount; i++)
Task.Factory.StartNew(Consumer);
}
public void Dispose() { taskX.CompleteAdding(); }
public void EnqueueTask (Action action) { taskX.Add(Action); }
void Consumer()
{
// This seq. that we are enumerating will BLOCK when no elements
// are avalible and will end when CompleteAdding is called.
foreach (Action action in taskX.GetConsumingEnumerable())
action(); // Perform your task.
}
}
A .NET 4.0 solution based on TPL is possible, while hiding away the fact that it needs to store the parent task somewhere. For example:
class QueuePool
{
private readonly Task[] _queues;
public QueuePool(int queueCount)
{ _queues = new Task[queueCount]; }
public void Enqueue(int queueIndex, Action action)
{
lock (_queues)
{
var parent = _queue[queueIndex];
if (parent == null)
_queues[queueIndex] = Task.Factory.StartNew(action);
else
_queues[queueIndex] = parent.ContinueWith(_ => action());
}
}
}
This is using a single lock for all queues, to illustrate the idea. In production code, however, I would use a lock per queue to reduce contention.
It looks like the design you already have is good and working. Your worker threads (one per queue) are long-running so if you want to use Task's instead, specify TaskCreationOptions.LongRunning so you get a dedicated worker thread.
But there isn't really a need to use the ThreadPool here. It doesn't offer many benefits for long-running work.