What is the difference between Task class and parallel class which part of TPL at implementation point of view.?
I believe task class is having more benefits than threadpool and thread but still context switch happens in task class as well.
But parallel class is basically design to run program on multicore processor?
Your question is extremely wide and can contain lots of details as an answer, but let me restrict to specific details.
Task - Wrap a method for execution down the line, it use the Lambda (Action, Func Delegate) to do the same. You can wrap now and execute anytime later.
Parallel is an API which helps achieve the Data Parallelization, where you can divide a collection (IEnumerable type) into smaller chunks and each can be executed in parallel and finally aggregated to achieve the result
There are broadly two kinds of parallelism, in one you can subdivide the bigger task into smaller ones, wrap them in a Task type and wait for all or some of them to complete in parallel. This is task parallelism
In other one you take each data unit in a collection and work on it in a mutually exclusive manner, which is data parallelism achieved by Parallel.forEach or Parallel.For APIs
These are introduced from .Net 4.0 onward to make the parallelism easy for the developer, else we had to dabble with Thread and ThreadPool class, which require much more in-depth understanding of the working of threads, here lot of complexity is taken care of internally.
However, don't be under the impression that current mechanism doesn't use threads, both the above mentioned form of parallelism rely completely on ThreadPool threads, that's why we have all the stuff like context -switching happening, multiple threads getting invoked, just that microsoft has made developer life easy by doing it
You may want to go through following links for a better understanding, let me know if there's still a specific query:
Parallel.ForEach vs Task.Factory.StartNew
Past, Present and Future of Parallelism
Parallel.ForEach vs Task.Run and Task.WhenAll
TPL is designed to minimize pre-emptive context-switching (caused by thread oversubscription – having more threads than cores). Task abstractions, of which TPL is an implementation, are designed for cooperative parallelism, where the developer controls when a task will relinquish its execution (typically upon completion). If you schedule more tasks than you have cores, TPL will only execute concurrently approximately as many tasks as you have core; the rest will be queued. This promotes throughout since it avoids the overheads of context-switching, but reduces responsiveness, as each task may take longer to start being processed.
The Parallel class is yet a higher level of abstraction that builds on top of TPL. Implementation-wise, Parallel generates a graph of tasks, but can use heuristics for deciding the granularity of the said tasks depending on your work.
Related
I am new to TAP and TPL in C#, After googling and reading some material I am not able to understand how async tasks are executed.(particularly on which thread)
Please help me to understand out of following two options which one is correct:
If async tasks are executed on the same thread on which they are invoked, then what if I create 1000 tasks and they all will be executed by time slicing on same thread? if it is not so and tasks execute on different threads then how is it different from multi-threading (parallel execution)?
If async tasks are executed on the same thread on which they are invoked, then what if I create 1000 tasks and they all will be executed by time slicing on same thread?
Since async tasks are not executed on the same thread on which they are invoked, it is meaningless to consider whether they will all be executed in some specific manner. In any case, there is no mechanism in .NET to "time-slice" within a single thread. Time-slicing is done at the granularity of a thread; i.e. CPU time is granted to threads, not subsets of threads.
In some scenarios involving async methods, it is possible that a given thread may service two or more tasks. This is inherent in the nature of thread pools, which are used extensively throughout the TPL API. But the only reason that such methods would be serviced in any sense of the word "concurrently" would be if they themselves are waiting on other asynchronous operations. In those cases, they can yield the thread cooperatively, at which point some other task might make use of it.
if it is not so and tasks execute on different threads then how is it different from multi-threading (parallel execution)?
async and Task are super-sets of "parallel execution". That is, some tasks execute in parallel. But async and Task are also used for operations that don't involve execution at all per se, but rather just waiting for some asynchronous event to occur outside the CPU (e.g. network or file I/O).
Computational tasks will be assigned an individual thread, for that task to use for the duration of the task. Other types of asynchronous operations will not use a thread at all.
For computational tasks, these are not different from multi-threading on a fundamental level. Like many things in programming, it's more that async and Task offer useful abstractions that allow one to write code more easily to take advantage of multi-threading features in the computer, and at the same time also represent non-computational operations that are otherwise semantically similar to concurrent computational operations.
I have done my best to answer your questions as directly and concisely as possible. I would not be surprised however if the above raises as many new questions as it's answered old. Unfortunately, that's the nature of potentially broad questions. I do hope that the above has satisfied your curiosity, but if not that would mean that your original question is too broad for Stack Overflow. Please consider rewriting your question so that it focuses more specifically on an issue; include a good, minimal, complete code example for context, so that the question can be addressed in a specific manner and without ambiguity.
I've been reading this article on MSDN about C# Concurrent Collections. It talks about the optimum threading to use for particular scenarios to get the most benefit out of the collections e.g:
ConcurrentQueue performs best when one dedicated thread is queuing and one dedicated thread is de-queuing. If you do not enforce this rule, then Queue might even perform slightly faster than ConcurrentQueue on computers that have multiple cores.
Is this advice still valid when one is using Tasks instead of raw Threads? From my (limited) understanding of C# Tasks, there is no guarantee that a particular Task will always run on the same thread between context switches, or does maintaining the stack frame mean that the same rules apply in terms of best usage?
Thanks.
One task always runs on the same thread. TPL is a user-mode library. User mode has no (practical) way of migrating executing code from thread to thread. Also there would be no point to doing that.
This advice applies exactly to tasks as it does to threads.
What that piece of advice means to say is that at the same time there should be one producer and one consumer only. You can have 100 threads enqueuing from time to time as long as they do not contend.
(I'm not questioning that advice here since that is out of scope for this question. But that is what's meant here.)
It appears that the Task class provides us the ability to use multiple processors of the system. Does the Thread class work on multiple processors as well or does it use time slicing only on a single processor? (Assuming a system with multiple cores).
My question is if threads will/could be executed on multiple cores then what is so special about Task and Parallelism ?
When you create a thread it forms kind of a logical group of work. The .NET Framework will aquire CPU-Time from the system. Most likely multiple threads will run on different cores (This is something the system handeles - not even .NET has any influence on this)
But it might be possible that the system will execute all your Threads on the same core or even moves the execution between several cores during the execution. Keep in Mind, that you are crating managed Threads, and not real System-Threads.
(Correctly spoken I should say: The System could execute your managed Threads within the same System-Thread or use multiple System-Threads for multiple managed threads.)
Maybe you want to have a look at this Blog-Post: http://www.drdobbs.com/parallel/managed-threads-are-different-from-windo/228800359 The explanation there is pretty good in terms of details.
Not a bad first question. +1
I would suggest you to read Threading in C# by Joseph Albahari. If you read through the post you will find:
How Threading Works
Multithreading is managed internally by a thread scheduler, a
function the CLR typically delegates to the operating system. A
thread scheduler ensures all active threads are allocated appropriate
execution time, and that threads that are waiting or blocked (for
instance, on an exclusive lock or on user input) do not consume CPU
time.
So multi threading is handled by operating system through a thread scheduler.
Further the post has:
On a multi-processor computer, multithreading is implemented with
a mixture of time-slicing and genuine concurrency, where different
threads run code simultaneously on different CPUs. It’s almost certain
there will still be some time-slicing, because of the operating
system’s need to service its own threads — as well as those of other
applications.
-It appears that Task class provide us the ability to use on multiple processors in the system.
-if threads will/could be executed on multiple cores then what is so special about Task Parallelism ?
The Task class is just a small, but important, part of TPL (Task Parallel Library). TPL is a high level abstraction, so you don't have to work with threads directly. It encapsulates and hides most of the churn you'd have to implement for any decent multi-threaded application.
Tasks don't introduce any new functionality that you couldn't implement on your own, per se (which is the core of your questions, I believe). They can be synchronous or asynchronous - when they are async, they're either using Thread class internally or IOCP ports.
Some of the points addressed by TPL are:
Rethrow exceptions from a child thread on the calling thread.
Asynchronous code (launch thread -> run arbitrary code while waiting for child thread -> resume when child thread is over) looks as if it were synchronous, greatly improving readability and maintainability
Simpler thread cancelation (using CancellationTokenSource)
Parallel queries/data manipulation using PLINQ or the Parallel class
Asynchronous workflows using TPL Dataflow
I was reading C# 5.0 in nutshell and after reading author's view(s), I am quite confused as to what should I adopt. My requirement is that say I have a really long running (computationally heavy) task, say for example, calculate SHA1 (or some other) hash of millions of file, or really any other thing is is computationally heavy and is likely to take some time, what should be my approach toward developing it (in winforms if that matters, using VS 2012, C# 5.0), so that I can also report progress to the user.
Following scenario(s) come to mind...
Create a Task (with LongRunning option that computes the hashes and report the progress to user either by implementing IProgess<T> or Progess<T> or letting the task capture the SynchronizationContext context and posting to the UI.
Create a Async method like
async CalculateHashesAsync()
{
// await here for tasks the calculate the hash
await Task.Rung(() => CalculateHash();
// how do I report progress???
}
Use TPL (or PLINQ) as
void CalcuateHashes()
{
Parallel.For(0, allFiles.Count, file => calcHash(file)
// how do I report progress here?
}
Use a producer / consumer Queue.
Don't really know how?
The author in the book says...
Running one long running task on a pooled thread won't cause
trouble. It's when you run multiple long running tasks in parallel
(particularly ones that block) that performance can suffer. In that
case, there are usually better solutions than
TaskCreationOptions.LongRunnging
If tasks are IO bound, TaskCompletionSource and asynchronous functions let you
implement concurrency with callbacks instead of threads.
If tasks are compute bound, a producer/consumer queue lets you throttle the concurrency for those tasks, avoiding starvation for
other threads and process.
About the Producer/Consumer the author says...
A producer/consumer queue is a useful structure, both in parallel
programming and general concurrency scenarios as it gives you precise
control over how many worker threads execute at once, which is useful
not only in limiting CPU consumption, but other resources as well.
So, should I not use task, meaning that first option is out? Is second one the best option? Are there any other options? And If I were to follow author's advice, and implement a producer/consumer, how would I do that (I don't even have an idea of how to get started with producer/consumer in my scenario, if that is the best approach!)
I'd like to know if someone has ever come across such a scenario, how would they implement? If not, what would be the most performance effective and/or easy to develop/maintain (I know the word performance is subjective, but let's just consider the very general case that it works, and works well!)
really long running (computationally heavy) task, say for example, calculate SHA1 (or some other) hash of millions of file
That example clearly has both heavy CPU (hashing) and I/O (file) components. Perhaps this is a non-representative example, but in my experience even a secure hash is far faster than reading the data from disk.
If you just have CPU-bound work, the best solution is either Parallel or PLINQ. If you just have I/O-bound work, the best solution is to use async. If you have a more realistic and complex scenario (with both CPU and I/O work), then you should either hook up your CPU and I/O parts with producer/consumer queues or use a more complete solution such as TPL Dataflow.
TPL Dataflow works well with both parallel (MaxDegreeOfParallelism) and async, and has a builtin producer/consumer queue in-between each block.
One thing to keep in mind when mixing massive amounts of I/O and CPU usage is that different situations can cause massively different performance characteristics. To be safe, you'll want to throttle the data going through your queues so you won't end up with memory usage issues. TPL Dataflow has built-in support for throttling via BoundedCapacity.
I have used most of the Threading library extensively. I am fairly familiar with creating new Threads, creating BackgroundWorkers and using the built-in .NET ThreadPool (which are all very cool).
However, I have never found a reason to use the Task class. I have seen maybe one or two examples of people using them, but the examples weren't very clear and they didn't give a high-level overview of why one should use a task instead of a new thread.
Question 1: From a high-level, when is using a task useful versus one of the other methods for parallelism in .NET?
Question 2: Does anyone have a simple and/or medium difficulty example demonstrating how to use tasks?
There are two main advantages in using Tasks:
Task can represent any result that will be available in the future (the general concept is not specific to .Net and it's called future), not just a computation. This is especially important with async-await, which uses Tasks for asynchronous operations. Since the operation that gets the result might fail, Tasks can also represent failures.
Task has lots of methods to operate on them. You can synchronously wait until it finishes (Wait()), wait for its result (Result), set up some operation when the Task finishes (ContinueWith()) and also some methods that work on several Tasks (WaitAll(), WaitAny(), ContinueWhenAll()). All of this is possible using other parallel processing methods, but you would have to do it manually.
And there are also some smaller advantages to using Task:
You can use a custom TaskScheduler to decide when and where will the Task run. This can be useful for example if you want to run a Task on the UI thread, limit the degree of parallelism or have a Task-level readers–writer lock.
Tasks support cooperative cancellation through CancellationToken.
Tasks that represent computations have some performance improvements. For example, they use work-stealing queue for more efficient processing and they also support inlining (executing Task that hasn't started yet on a thread that synchronously waits for it).