I’m writing a win forms that uses the report viewer for the creation of multiple PDF files. These PDF files are divided in 4 main parts, each part is responsible for the creation of a specific report. These processes are creating a minimum of 1 file up to the number of users (currently 50).
The program already exists using there 4 methods sequentially. For extra performance where the number of users is growing, I want to separate these methods from the mail process in 4 separate threads.
While I'm new to multithreading using C# I read a number of articles how to achieve this. The only thing I'm not sure of is which way I should start. As I read multiple blog posts I'm not sure if to use 4 separate threads, a thread pool or multiple background workers. (or should parallel programming be the best way?). Blog posts tell me if more than 3 threads use a thread pool, but on the other hand the tell me if using winforms, use the backgroundworker. Which option is best (and why)?
At the end my main thread has to wait for all processes to end before continuing.
Can someone tell me what's the best solution to my problem.
* Extra information after edit *
Which i forgot to tell (after i read al your comments and possible solutions). The methods share one "IEnumerable" only for reading. After firing the methods (that don't have to run sequentially), the methods trigger events for for sending status updates to the UI. I think triggering events is difficult if not impossible using separate threads so there should be some kind of callback function to report status updates while running.
some example in psuedo code.
main()
{
private List<customclass> lcc = importCustomClass()
export.CreatePDFKind1.create(lcc.First(), exportfolderpath, arg1)
export.CreatePDFKind2.create(lcc, exportfolderpath)
export.CreatePDFKind3.create(lcc.First(), exportfolderpath)
export.CreatePDFKind4.create(customclass2, exportfolderpath)
}
namespace export
{
class CreatePDFKind1
{
create(customclass cc, string folderpath)
{
do something;
reportstatus(listviewItem, status, message)
}
}
class CreatePDFKind2
{
create(IEnumerable<customclass> lcc, string folderpath)
{
foreach (var x in lcc)
{
do something;
reportstatus(listviewItem, status, message)
}
}
}
etc.......
}
From the very basic picture you have described, I would use the Task Paralell Library (TPL). Shipped with .NET Framework 4.0+.
You talk about the 'best' option of using thread pools when spawning a large-to-medium number of threads. Dispite this being correct [the most efficent way of mangaing the resources], the TPL does all of this for you - without you having to worry about a thing. The TPL also makes the use of multiple threads and waiting on their completion a doddle too...
To do what you require I would use the TPL and Continuations. A continuation not only allows you to create a flow of tasks but also handles your exceptions. This is a great introduction to the TPL. But to give you some idea...
You can start a TPL task using
Task task = Task.Factory.StartNew(() =>
{
// Do some work here...
});
Now to start a second task when an antecedent task finishes (in error or successfully) you can use the ContinueWith method
Task task1 = Task.Factory.StartNew(() => Console.WriteLine("Antecedant Task"));
Task task2 = task1.ContinueWith(antTask => Console.WriteLine("Continuation..."));
So as soon as task1 completes, fails or is cancelled task2 'fires-up' and starts running. Note that if task1 had completed before reaching the second line of code task2 would be scheduled to execute immediately. The antTask argument passed to the second lambda is a reference to the antecedent task. See this link for more detailed examples...
You can also pass continuations results from the antecedent task
Task.Factory.StartNew<int>(() => 1)
.ContinueWith(antTask => antTask.Result * 4)
.ContinueWith(antTask => antTask.Result * 4)
.ContinueWith(antTask =>Console.WriteLine(antTask.Result * 4)); // Prints 64.
Note. Be sure to read up on exception handling in the first link provided as this can lead a newcomer to TPL astray.
One last thing to look at in particular for what you want is child tasks. Child tasks are those which are created as AttachedToParent. In this case the continuation will not run until all child tasks have completed
TaskCreationOptions atp = TaskCreationOptions.AttachedToParent;
Task.Factory.StartNew(() =>
{
Task.Factory.StartNew(() => { SomeMethod() }, atp);
Task.Factory.StartNew(() => { SomeOtherMethod() }, atp);
}).ContinueWith( cont => { Console.WriteLine("Finished!") });
So in your case you would start your four tasks, then wait on their completion on the main thread.
I hope this helps.
Using a BackgroundWorker is helpful if you need to interact with the UI with respect to your background process. If you don't, then I wouldn't bother with it. You can just start 4 Task objects directly:
tasks.Add(Task.Factory.StartNew(()=>DoStuff()));
tasks.Add(Task.Factory.StartNew(()=>DoStuff2()));
tasks.Add(Task.Factory.StartNew(()=>DoStuff3()));
If you do need to interact with the UI; possibly by updating it to reflect when the tasks are finished, then I would suggest staring one BackgroundWorker and then using tasks again to process each individual unit of work. Since there is some additional overhead in using a BackgroundWorker I would avoid starting lots of them if you can avoid it.
BackgroundWorker bgw = new BackgroundWorker();
bgw.DoWork += (_, args) =>
{
List<Task> tasks = new List<Task>();
tasks.Add(Task.Factory.StartNew(() => DoStuff()));
tasks.Add(Task.Factory.StartNew(() => DoStuff2()));
tasks.Add(Task.Factory.StartNew(() => DoStuff3()));
Task.WaitAll(tasks.ToArray());
};
bgw.RunWorkerCompleted += (_, args) => updateUI();
bgw.RunWorkerAsync();
You could of course use just Task methods to do all of this, but I still find BackgroundWorkers a bit simpler to work with for the simpler cases. Using .NEt 4.5 you could use Task.WhenAll to run a continuation in the UI thread when all 4 tasks finished, but doing that in 4.0 wouldn't be quite as simple.
Without further information it's impossible to tell. The fact that they're in four separate methods doesn't make much of a difference if they're accessing the same resources. The PDF file for example. If you're having trouble understanding what I mean you should post some of the code for each method and I'll go into a little more detail.
Since the number of "parts" you have is fixed it won't make a big difference whether you use separate threads, background workers or use a thread pool. I'm not sure why people are recommending background workers. Most likely because it's a simpler approach to multithreading and more difficult to screw up.
Related
I have this code:
var dt = new DeveloperTest();
var tasks = readers.Select(dt.ProcessReaderAsync).ToList();
var printCounterTask = new Task(() => dt.DelayedPrint(output));
printCounterTask.Start();
Task.WhenAll(tasks).ContinueWith(x => dt.Print(output).ContinueWith(_ =>
{
dt.Finished = true;
})).Wait();
printCounterTask.Wait();
What this does is preparing tasks that will be run and then start a (I think ) parallel execution which starts with:
printCounterTask.Start();
this is what delayed print does:
public async Task DelayedPrint(IOutputResult output)
{
while (true)
{
if (!Finished)
{
//every 10 seconds should print.
//at least one print even if the execution is less than 10 seconds
//as this starts in paralel with the processing
Task.Delay(10 * 1000).Wait();
await Print(output);
}
else
{
#if DEBUG
Console.WriteLine("Finished with printing");
#endif
break;
}
}
}
Basically is printing some output that is delayed every 10 seconds, then when all the tasks are complete stops the infinite loop.
if you want to see the whole code is here https://github.com/velchev/Exclaimer-Test
I am not sure if this
Task.WhenAll(tasks).ContinueWith(x => dt.Print(output).ContinueWith(_ =>
{
dt.Finished = true;
})).Wait();
runs in parallel with printCounterTask.Start();
When I debut it seems it does as a breakpoint in the !Finished code is hit and then in the else clause too. As far as I know when you start a task it runs in parallel so all the tasks should run in parallel. A task is a representation of a thread which syntactically is easier to control compared to the old syntax. So all this threads running and because of the better syntax is easier to say - wait till all finish and then change the flag. Any helpful explanation will be appreciated. Thank you mates.
The code is mostly correct as written, but there are some nuances around the Task constructor and ContinueWith that make it difficult to understand, and make it easy to break. For example, printCounterTask.Wait() will not wait until DelayedPrint completes, because the Task constructor does not understand asynchronous delegates.
To make the code fully correct and much easier to read and reason about, replace new Task/Start with Task.Run, and replace ContinueWith with await:
var dt = new DeveloperTest();
var tasks = readers.Select(dt.ProcessReaderAsync).ToList();
var printCounterTask = Task.Run(() => dt.DelayedPrint(output));
await Task.WhenAll(tasks);
await dt.Print(output);
dt.Finished = true;
await printCounterTask;
You will also find your code to be clearer if you follow the convention of suffixing asynchronous methods with Async.
A task is a representation of a thread which syntactically is easier to control compared to the old syntax.
No, not at all. A task is a Future - a representation of an operation that may complete sometime in the future. This "operation" does not necessarily require a thread. Task.Run does queue work to the thread pool, but in this example, the task does not always use a thread pool thread (specifically, it doesn't use a thread pool thread during the await Task.Delay).
You are partly right.
The tasks will run in parallel with
printCounterTask
as expected.
However a task is not a representation of a thread and not a syntactic sugaring which easier to control over thread.
Here you can find a useful information:
https://www.dotnetforall.com/difference-task-and-thread/
In general it's important for you to understand that Tasks are using Threads from the ThreadPool.
A task is a representation of a method you wish to execute as a background work (you don't want to block the current execution), and a task needs a thread in order to operate, but it's not true that a task is a thread.
You may have more tasks than available threads in the thread pool, which will lead to them waiting in the queue for available thread in order to be executed.
Also take in consideration that Task.WhenAll will not execute the tasks for you, you'll have to execute them yourself (implementation of ProcessReaderAsync is missing, but if you're using Task.Run it's OK).
If I am creating Tasks using a for loop will those tasks run in parallel or would they just run one after the other?
Here is my code -
private void initializeAllSpas()
{
Task[] taskArray = new Task[spaItems.Count];
for(int i = 0; i < spaItems.Count; i++)
{
taskArray[i] = Task.Factory.StartNew(() => spaItems[i].initializeThisSpa());
}
Task.WhenAll(taskArray).Wait();
foreach (var task in taskArray) task.Dispose();
}
where spaItems is a list of items from another class, call it SpaItem, in which the initializeThisSpa() function opens a file and updates the information for that particular SpaItem.
My question is, does the above code actually excute initializeThisSpa() on all of the spaItems at the same time? if not, how can I correct that?
(I Ignored syntax issues if any and not tested)
At the same time?..
Not guaranteed. At least (the best bet) definitely there will be nano secs difference.
Tasks are placed in a queue.
And every task waits for its opportunity for a thread from threadpool, for its turn of execution.
It all depends on the availability of threads in thread pool. If no thread available, the tasks waits in queue.
There are different states for the task before its final execution. Here is a good explanation. And after going through this link, you will come to know that it is almost impossible to call a function at the same time from multiple tasks.
https://blogs.msdn.microsoft.com/pfxteam/2009/08/30/the-meaning-of-taskstatus/
You can achieve tasks sequentially (one after another) calling a specific function by creating tasks with methods like "ContinueWith, ContinueWhenAll, ContinueWhenAny,"
An example is below in MSDN documentation link.
https://msdn.microsoft.com/en-us/library/dd321473(v=vs.110).aspx
I am new to threaded programming. I have to run few tasks in PARALLEL and in Background (so that main UI execution thread remain responsive to user actions) and wait for each one of them to complete before proceeding further execution.
Something like:
foreach(MyTask t in myTasks)
{
t.DoSomethinginBackground(); // There could be n number of task, to save
// processing time I wish to run each of them
// in parallel
}
// Wait till all tasks complete doing something parallel in background
Console.Write("All tasks Completed. Now we can do further processing");
I understand that there could be several ways to achieve this. But I am looking for the best solution to implement in .Net 4.0 (C#).
To me it would seem like you want Parallel.ForEach
Parallel.ForEach(myTasks, t => t.DoSomethingInBackground());
Console.Write("All tasks Completed. Now we can do further processing");
You can also perform multiple tasks within a single loop
List<string> results = new List<string>(myTasks.Count);
Parallel.ForEach(myTasks, t =>
{
string result = t.DoSomethingInBackground();
lock (results)
{ // lock the list to avoid race conditions
results.Add(result);
}
});
In order for the main UI thread to remain responsive, you will want to use a BackgroundWorker and subscribe to its DoWork and RunWorkerCompleted events and then call
worker.RunWorkerAsync();
worker.RunWorkerAsync(argument); // argument is an object
You can use Task library to complete:
string[] urls = ...;
var tasks = urls.Select(url => Task.Factory.StartNew(() => DoSomething(url)));
To avoid locking UI Thread, you can use ContinueWhenAll in .NET 4.0:
Task.Factory.ContinueWhenAll(tasks.ToArray(), _ =>
Console.Write("All tasks Completed. Now we can do further processing");
);
If you are in the latest version of .NET, you can use Task.WhenAll instead
If you use Net 4.0 or up, refer to the Parallel class and Task class. Joseph Albahari wrote very clear book about that: http://www.albahari.com/threading/part5.aspx#_Creating_and_Starting_Tasks
I want to queue dependant tasks across several flows that need to be processed in order (in each flow). The flows can be processed in parallel.
To be specific, let's say I need two queues and I want the tasks in each queue to be processed in order. Here is sample pseudocode to illustrate the desired behavior:
Queue1_WorkItem wi1a=...;
enqueue wi1a;
... time passes ...
Queue1_WorkItem wi1b=...;
enqueue wi1b; // This must be processed after processing of item wi1a is complete
... time passes ...
Queue2_WorkItem wi2a=...;
enqueue wi2a; // This can be processed concurrently with the wi1a/wi1b
... time passes ...
Queue1_WorkItem wi1c=...;
enqueue wi1c; // This must be processed after processing of item wi1b is complete
Here is a diagram with arrows illustrating dependencies between work items:
The question is how do I do this using C# 4.0/.NET 4.0? Right now I have two worker threads, one per queue and I use a BlockingCollection<> for each queue. I would like to instead leverage the .NET thread pool and have worker threads process items concurrently (across flows), but serially within a flow. In other words I would like to be able to indicate that for example wi1b depends on completion of wi1a, without having to track completion and remember wi1a, when wi1b arrives. In other words, I just want to say, "I want to submit a work item for queue1, which is to be processed serially with other items I have already submitted for queue1, but possibly in parallel with work items submitted to other queues".
I hope this description made sense. If not please feel free to ask questions in the comments and I will update this question accordingly.
Thanks for reading.
Update:
To summarize "flawed" solutions so far, here are the solutions from the answers section that I cannot use and the reason(s) why I cannot use them:
TPL tasks require specifying the antecedent task for a ContinueWith(). I do not want to maintain knowledge of each queue's antecedent task when submitting a new task.
TDF ActionBlocks looked promising, but it would appear that items posted to an ActionBlock are processed in parallel. I need for the items for a particular queue to be processed serially.
Update 2:
RE: ActionBlocks
It would appear that setting the MaxDegreeOfParallelism option to one prevents parallel processing of work items submitted to a single ActionBlock. Therefore it seems that having an ActionBlock per queue solves my problem with the only disadvantage being that this requires the installation and deployment of the TDF library from Microsoft and I was hoping for a pure .NET 4.0 solution. So far, this is the candidate accepted answer, unless someone can figure out a way to do this with a pure .NET 4.0 solution that doesn't degenerate to a worker thread per queue (which I am already using).
I understand you have many queues and don't want to tie up threads. You could have an ActionBlock per queue. The ActionBlock automates most of what you need: It processes work items serially, and only starts a Task when work is pending. When no work is pending, no Task/Thread is blocked.
The best way is to use the Task Parallel Library (TPL) and Continuations. A continuation not only allows you to create a flow of tasks but also handles your exceptions. This is a great introduction to the TPL. But to give you some idea...
You can start a TPL task using
Task task = Task.Factory.StartNew(() =>
{
// Do some work here...
});
Now to start a second task when an antecedent task finishes (in error or successfully) you can use the ContinueWith method
Task task1 = Task.Factory.StartNew(() => Console.WriteLine("Antecedant Task"));
Task task2 = task1.ContinueWith(antTask => Console.WriteLine("Continuation..."));
So as soon as task1 completes, fails or is cancelled task2 'fires-up' and starts running. Note that if task1 had completed before reaching the second line of code task2 would be scheduled to execute immediately. The antTask argument passed to the second lambda is a reference to the antecedent task. See this link for more detailed examples...
You can also pass continuations results from the antecedent task
Task.Factory.StartNew<int>(() => 1)
.ContinueWith(antTask => antTask.Result * 4)
.ContinueWith(antTask => antTask.Result * 4)
.ContinueWith(antTask =>Console.WriteLine(antTask.Result * 4)); // Prints 64.
Note. Be sure to read up on exception handling in the first link provided as this can lead a newcomer to TPL astray.
One last thing to look at in particular for what you want is child tasks. Child tasks are those which are created as AttachedToParent. In this case the continuation will not run until all child tasks have completed
TaskCreationOptions atp = TaskCreationOptions.AttachedToParent;
Task.Factory.StartNew(() =>
{
Task.Factory.StartNew(() => { SomeMethod() }, atp);
Task.Factory.StartNew(() => { SomeOtherMethod() }, atp);
}).ContinueWith( cont => { Console.WriteLine("Finished!") });
I hope this helps.
Edit: Have you had a look at ConcurrentCollections in particular the BlockngCollection<T>. So in your case you might use something like
public class TaskQueue : IDisposable
{
BlockingCollection<Action> taskX = new BlockingCollection<Action>();
public TaskQueue(int taskCount)
{
// Create and start new Task for each consumer.
for (int i = 0; i < taskCount; i++)
Task.Factory.StartNew(Consumer);
}
public void Dispose() { taskX.CompleteAdding(); }
public void EnqueueTask (Action action) { taskX.Add(Action); }
void Consumer()
{
// This seq. that we are enumerating will BLOCK when no elements
// are avalible and will end when CompleteAdding is called.
foreach (Action action in taskX.GetConsumingEnumerable())
action(); // Perform your task.
}
}
A .NET 4.0 solution based on TPL is possible, while hiding away the fact that it needs to store the parent task somewhere. For example:
class QueuePool
{
private readonly Task[] _queues;
public QueuePool(int queueCount)
{ _queues = new Task[queueCount]; }
public void Enqueue(int queueIndex, Action action)
{
lock (_queues)
{
var parent = _queue[queueIndex];
if (parent == null)
_queues[queueIndex] = Task.Factory.StartNew(action);
else
_queues[queueIndex] = parent.ContinueWith(_ => action());
}
}
}
This is using a single lock for all queues, to illustrate the idea. In production code, however, I would use a lock per queue to reduce contention.
It looks like the design you already have is good and working. Your worker threads (one per queue) are long-running so if you want to use Task's instead, specify TaskCreationOptions.LongRunning so you get a dedicated worker thread.
But there isn't really a need to use the ThreadPool here. It doesn't offer many benefits for long-running work.
I have a function where I want to execute in a separate thread avoiding two threads to access the same resources. Also I want to make sure that if the thread is currently executing then stop that thread and start executing the new thread. This is what I have:
volatile int threadCount = 0; // use it to know the number of threads being executed
private void DoWork(string text, Action OncallbackDone)
{
threadCount++;
var t = new Thread(new ThreadStart(() =>
{
lock (_lock) // make sure that this code is only accessed by one thread
{
if (threadCount > 1) // if a new thread got in here return and let the last one execute
{
threadCount--;
return;
}
// do some work in here
Thread.Sleep(1000);
OncallbackDone();
threadCount--;
}
}));
t.Start();
}
if I fire that method 5 times then all the threads will be waiting for the lock until the lock is released. I want to make sure that I execute the last thread though. when the threads are waiting to be the owner of the lock how can I determine which will be the next one owning the lock. I want them to own the resource in the order that I created the threads...
EDIT
I am not creating this application with .net 4.0 . Sorry for not mentioning what I was trying to accomplish. I am creating an autocomplete control where I am filtering a lot of data. I don't want the main window to freeze eveytime I want to filter results. also I want to filter results as the user types. If the user types 5 letters at once I want to stop all threads and I will just be interested in the last one. because the lock blocks all the threads sometimes the last thread that I created may own the lock first.
I think you are overcomplicating this. If you are able to use 4.0, then just use the Task Parallel Library. With it, you can just set up a ContinueWith function so that threads that must happen in a certain order are done in the order you dictate. If this is NOT what you are looking for, then I actually would suggest that you not use threading, as this sounds like a synchronous action that you are trying to force into parallelism.
If you are just looking to cancel tasks: then here is a SO question on how to cancel TPL tasks. Why waste the resources if you are just going to dump them all except for the last one.
If you are not using 4.0, then you can accomplish the same thing with a Background Worker. It just takes more boilerplate code to accomplish the same thing :)
I agree with Justin in that you should use the .NET 4 Task Parallel Library. But if you want complete control you should not use the default Task Scheduler, which favors LIFO, but create your own Task Scheduler (http://msdn.microsoft.com/en-us/library/system.threading.tasks.taskscheduler.aspx) and implement the logic that you want to determine which task gets preference.
Using Threads directly is not recommended unless you have deep knowledge of .NET Threading. If you are on .NET 4.0; Tasks and TPL are preferred.
This is what I came up with after reading the links that you guys posted. I guess I needed a Queue therefore I implemented:
volatile int threadCount = 0;
private void GetPredicateAsync(string text, Action<object> DoneCallback)
{
threadCount++;
ThreadPool.QueueUserWorkItem((x) =>
{
lock (_lock)
{
if (threadCount > 1) // disable executing threads at same time
{
threadCount--;
return; // if a new thread is created exit.
// let the newer task do work!
}
// do work in here
Application.Current.Dispatcher.BeginInvoke(new Action(() =>
{
threadCount--;
DoneCallback(Foo);
}));
}
},text);
}