when to use Task and when to use Thread? - c#

I've just asked question about Task but realized that I actually want to ask more general question.
Could someone summarize pros and cons of Tasks and Threads.
How to understand should I use Task or Thread?

Task is an order to program to do something in asynchronous way. The Thread is actually OS kernel object which executes what was requested. Think about Task like a clever thread aggregator/organizer that "knows" how much task is better to run contemporary on your CPU. It's just cleverer then common implementations of multi-threading (that's why it's suggested choice from Microsoft). It's a feature that helps you managing Threads in easier way.
Look also on this Should i use ThreadPools or Task Parallel Library for IO-bound operations that may give you some hints on performance issues you may be interested in.

Related

C# TPL Threading Task

What is the difference between Task class and parallel class which part of TPL at implementation point of view.?
I believe task class is having more benefits than threadpool and thread but still context switch happens in task class as well.
But parallel class is basically design to run program on multicore processor?
Your question is extremely wide and can contain lots of details as an answer, but let me restrict to specific details.
Task - Wrap a method for execution down the line, it use the Lambda (Action, Func Delegate) to do the same. You can wrap now and execute anytime later.
Parallel is an API which helps achieve the Data Parallelization, where you can divide a collection (IEnumerable type) into smaller chunks and each can be executed in parallel and finally aggregated to achieve the result
There are broadly two kinds of parallelism, in one you can subdivide the bigger task into smaller ones, wrap them in a Task type and wait for all or some of them to complete in parallel. This is task parallelism
In other one you take each data unit in a collection and work on it in a mutually exclusive manner, which is data parallelism achieved by Parallel.forEach or Parallel.For APIs
These are introduced from .Net 4.0 onward to make the parallelism easy for the developer, else we had to dabble with Thread and ThreadPool class, which require much more in-depth understanding of the working of threads, here lot of complexity is taken care of internally.
However, don't be under the impression that current mechanism doesn't use threads, both the above mentioned form of parallelism rely completely on ThreadPool threads, that's why we have all the stuff like context -switching happening, multiple threads getting invoked, just that microsoft has made developer life easy by doing it
You may want to go through following links for a better understanding, let me know if there's still a specific query:
Parallel.ForEach vs Task.Factory.StartNew
Past, Present and Future of Parallelism
Parallel.ForEach vs Task.Run and Task.WhenAll
TPL is designed to minimize pre-emptive context-switching (caused by thread oversubscription – having more threads than cores). Task abstractions, of which TPL is an implementation, are designed for cooperative parallelism, where the developer controls when a task will relinquish its execution (typically upon completion). If you schedule more tasks than you have cores, TPL will only execute concurrently approximately as many tasks as you have core; the rest will be queued. This promotes throughout since it avoids the overheads of context-switching, but reduces responsiveness, as each task may take longer to start being processed.
The Parallel class is yet a higher level of abstraction that builds on top of TPL. Implementation-wise, Parallel generates a graph of tasks, but can use heuristics for deciding the granularity of the said tasks depending on your work.

Optimum use of Concurrent Collections with Threads Vs. Tasks

I've been reading this article on MSDN about C# Concurrent Collections. It talks about the optimum threading to use for particular scenarios to get the most benefit out of the collections e.g:
ConcurrentQueue performs best when one dedicated thread is queuing and one dedicated thread is de-queuing. If you do not enforce this rule, then Queue might even perform slightly faster than ConcurrentQueue on computers that have multiple cores.
Is this advice still valid when one is using Tasks instead of raw Threads? From my (limited) understanding of C# Tasks, there is no guarantee that a particular Task will always run on the same thread between context switches, or does maintaining the stack frame mean that the same rules apply in terms of best usage?
Thanks.
One task always runs on the same thread. TPL is a user-mode library. User mode has no (practical) way of migrating executing code from thread to thread. Also there would be no point to doing that.
This advice applies exactly to tasks as it does to threads.
What that piece of advice means to say is that at the same time there should be one producer and one consumer only. You can have 100 threads enqueuing from time to time as long as they do not contend.
(I'm not questioning that advice here since that is out of scope for this question. But that is what's meant here.)

Is it fine to use tasks and thread-pool together?

After reading how the thread pool and tasks work in this article I came up with this question -
If I have a complex program in which some modules use tasks and some use thread pool, is it possible that there will be some scheduling problems due to the different uses?
Task are often implemented using the thread pool (one can of course also have tasks using other types of schedulers that give different behavior, but this is the default). In terms of the actual code being executed (assuming your tasks are representing delegates being run) there really isn't much difference.
Tasks are simply creating a wrapper around that thread pool call to provide additional functionality when it comes to gather information about, and processing the results of, that asynchronous operation. If you want to leverage that additional functionality then use tasks. If you have no need to use it in some particular context, there's nothing wrong with using the thread pool directly.
Mix the two, so long as you don't have trouble getting what you want out of the results of those operations, is not a problem at all.
No. And there actually isn't much in the way of memory or performance inefficiencies when mixing approaches; by default tasks use the same thread pool that thread pool threads use.
The only significant disadvantage of mixing both is lack of consistency in your codebase. If you were to pick one, I would use TPL since it is has a rich API for handling many aspects of multi-threading and takes advantage of async/await language features.
Since your usage is divided down module lines, you don't have much to worry about.
No, there wouldn't be problems - you just would be inefficient in doing both. use what is really needed and stick with the pattern. Remember to be sure that you make your app MT Safe also especially if you are accessing the same resources/variables etc... from different threads, regardless of which threading algorithm you use.
There shouldn't be any scheduling problems as such, but of course it's better to use Tasks and let the Framework decide what to do with the scheduled work. In the current version of the framework (4.5) the work will be queued through the ThreadPool unless the LongRunning option is used, but this behaviour may change in future of course.
Verdict: Mixing Tasks and ThreadPool isn't a problem, but for new applications it's recommended to use Tasks instead of queueing work items directly on the ThreadPool (one reason for that is ThreadPool isn't available in Windows 8 Runtime (Modern UI apps).

PLINQ vs Tasks vs Async vs Producer/Consumer queue? What to use?

I was reading C# 5.0 in nutshell and after reading author's view(s), I am quite confused as to what should I adopt. My requirement is that say I have a really long running (computationally heavy) task, say for example, calculate SHA1 (or some other) hash of millions of file, or really any other thing is is computationally heavy and is likely to take some time, what should be my approach toward developing it (in winforms if that matters, using VS 2012, C# 5.0), so that I can also report progress to the user.
Following scenario(s) come to mind...
Create a Task (with LongRunning option that computes the hashes and report the progress to user either by implementing IProgess<T> or Progess<T> or letting the task capture the SynchronizationContext context and posting to the UI.
Create a Async method like
async CalculateHashesAsync()
{
// await here for tasks the calculate the hash
await Task.Rung(() => CalculateHash();
// how do I report progress???
}
Use TPL (or PLINQ) as
void CalcuateHashes()
{
Parallel.For(0, allFiles.Count, file => calcHash(file)
// how do I report progress here?
}
Use a producer / consumer Queue.
Don't really know how?
The author in the book says...
Running one long running task on a pooled thread won't cause
trouble. It's when you run multiple long running tasks in parallel
(particularly ones that block) that performance can suffer. In that
case, there are usually better solutions than
TaskCreationOptions.LongRunnging
If tasks are IO bound, TaskCompletionSource and asynchronous functions let you
implement concurrency with callbacks instead of threads.
If tasks are compute bound, a producer/consumer queue lets you throttle the concurrency for those tasks, avoiding starvation for
other threads and process.
About the Producer/Consumer the author says...
A producer/consumer queue is a useful structure, both in parallel
programming and general concurrency scenarios as it gives you precise
control over how many worker threads execute at once, which is useful
not only in limiting CPU consumption, but other resources as well.
So, should I not use task, meaning that first option is out? Is second one the best option? Are there any other options? And If I were to follow author's advice, and implement a producer/consumer, how would I do that (I don't even have an idea of how to get started with producer/consumer in my scenario, if that is the best approach!)
I'd like to know if someone has ever come across such a scenario, how would they implement? If not, what would be the most performance effective and/or easy to develop/maintain (I know the word performance is subjective, but let's just consider the very general case that it works, and works well!)
really long running (computationally heavy) task, say for example, calculate SHA1 (or some other) hash of millions of file
That example clearly has both heavy CPU (hashing) and I/O (file) components. Perhaps this is a non-representative example, but in my experience even a secure hash is far faster than reading the data from disk.
If you just have CPU-bound work, the best solution is either Parallel or PLINQ. If you just have I/O-bound work, the best solution is to use async. If you have a more realistic and complex scenario (with both CPU and I/O work), then you should either hook up your CPU and I/O parts with producer/consumer queues or use a more complete solution such as TPL Dataflow.
TPL Dataflow works well with both parallel (MaxDegreeOfParallelism) and async, and has a builtin producer/consumer queue in-between each block.
One thing to keep in mind when mixing massive amounts of I/O and CPU usage is that different situations can cause massively different performance characteristics. To be safe, you'll want to throttle the data going through your queues so you won't end up with memory usage issues. TPL Dataflow has built-in support for throttling via BoundedCapacity.

Asynchronous Delegate Invocation (ADI) vs. Task Parallel Library (TPL)

I get this comment on ADI while reading Essential C# 4.0:
Unfortunately, the underlying
technology used by the asynchronous
delegate invocation pattern is an
end-of-further-development technology
for distributed programming known as
remoting. And although Microsoft still
supports the use of asynchronous
delegate invocation and it will
continue to function as it does today
for the foreseeable future, the
performance characteristics are
suboptimal given other
approaches—namely Thread, ThreadPool,
and TPL. Therefore, developers should
tend to favor one of these
alternatives rather than implementing
new development using the asynchronous
delegate invocation API. Further
discussion of the pattern is included
in the Advanced Topic text that
follows so that developers who
encounter it will understand how it
works.
So are there any limitations that ADI has and TPL doesn't, besides that TPL probably uses a not-end-of-further-development-yet technology?
Tasks and async delegates both use thread pool.
Tasks and async delegates are similar in the sense that exception can be propagated to caller. Tasks go one step further, accumulating all thrown exceptions and presenting them for all thread pool workers together.
Tasks allow for cancellation.
There's a free chapter that describes all of this in more detail:
http://www.albahari.com/threading/
You ask for "limitations".
I don't think you will find anything that can't be done with ADI (also called APM). The point is performance and programmer effort.
The verdict seems unanimous, Joe Duffy also warns you away from the ADI/APM
And the conclusion is easy, use the TPL if you can. It is easy and efficient. And it's at the just-the -beginning-of-further-development point.
Not that I am an expert in TPL. From what I understand TPL abstracts the decisions on the level of parallelism as configurations/specification.
For instance, in a parallel for loop.
Parallel.For(0, 1000, a => Thread.Sleep(10000));
You don't necessarily spawn 1000 threads. The TPL will "parallelise" to the appropriate number of threads. As opposed to asynchronously invoking a method 1000 times. (Which won't create 1000 threads either, but you will just have blocked execution calls until the required resources are freed up.
Also, TPL allows you a higher level control of the parallel tasks. In the above example, you can pause/break/abort the for loop easily. Such as.
Parrallel.For(0, 1000, (a, loopState) => loopState.Break());
It's a bit of hassle to achieve the above using conventional async method invoke.
TL,DR: TPL are more efficient and easier to use.

Categories

Resources