I am new to TAP and TPL in C#, After googling and reading some material I am not able to understand how async tasks are executed.(particularly on which thread)
Please help me to understand out of following two options which one is correct:
If async tasks are executed on the same thread on which they are invoked, then what if I create 1000 tasks and they all will be executed by time slicing on same thread? if it is not so and tasks execute on different threads then how is it different from multi-threading (parallel execution)?
If async tasks are executed on the same thread on which they are invoked, then what if I create 1000 tasks and they all will be executed by time slicing on same thread?
Since async tasks are not executed on the same thread on which they are invoked, it is meaningless to consider whether they will all be executed in some specific manner. In any case, there is no mechanism in .NET to "time-slice" within a single thread. Time-slicing is done at the granularity of a thread; i.e. CPU time is granted to threads, not subsets of threads.
In some scenarios involving async methods, it is possible that a given thread may service two or more tasks. This is inherent in the nature of thread pools, which are used extensively throughout the TPL API. But the only reason that such methods would be serviced in any sense of the word "concurrently" would be if they themselves are waiting on other asynchronous operations. In those cases, they can yield the thread cooperatively, at which point some other task might make use of it.
if it is not so and tasks execute on different threads then how is it different from multi-threading (parallel execution)?
async and Task are super-sets of "parallel execution". That is, some tasks execute in parallel. But async and Task are also used for operations that don't involve execution at all per se, but rather just waiting for some asynchronous event to occur outside the CPU (e.g. network or file I/O).
Computational tasks will be assigned an individual thread, for that task to use for the duration of the task. Other types of asynchronous operations will not use a thread at all.
For computational tasks, these are not different from multi-threading on a fundamental level. Like many things in programming, it's more that async and Task offer useful abstractions that allow one to write code more easily to take advantage of multi-threading features in the computer, and at the same time also represent non-computational operations that are otherwise semantically similar to concurrent computational operations.
I have done my best to answer your questions as directly and concisely as possible. I would not be surprised however if the above raises as many new questions as it's answered old. Unfortunately, that's the nature of potentially broad questions. I do hope that the above has satisfied your curiosity, but if not that would mean that your original question is too broad for Stack Overflow. Please consider rewriting your question so that it focuses more specifically on an issue; include a good, minimal, complete code example for context, so that the question can be addressed in a specific manner and without ambiguity.
Related
I know the differences between a thread and a task., but I cannot understand if creating threads inside tasks is the same as creating only threads.
It depends on how you use the multithreaded capabilities and the asynchronous programming semantics of the language.
Simple facts first. Assume you have an initial, simple, single-threaded, and near empty application (that just reads a line of input with Console.ReadLine for simplicity sake). If you create a new Thread, then you've created it from within another thread, the main thread. Therefore, creating a thread from within a thread is a perfectly valid operation, and the starting point of any multithreaded application.
Now, a Task is not a thread per se, but it gets executed in one when you do Task.Run which is selected from a .NET managed thread pool. As such, if you create a new thread from within a task, you're essentially creating a thread from within a thread (same as above, no harm done). The caveat here is, that you don't have control of the thread or its lifetime, that is, you can't kill it, suspend it, resume it, etc., because you don't have a handle to that thread. If you want some unit of work done, and you don't care which thread does it, just that's it not the current one, then Task.Run is basically the way to go. With that said, you can always start a new thread from within a task, actually, you can even start a task from within a task, and here is some official documentation on unwrapping nested tasks.
Also, you can await inside a task, and create a new thread inside an async method if you want. However, the usability pattern for async and await is that you use them for I/O bound operations, these are operations that require little CPU time but can take long because they need to wait for something, such as network requests, and disk access. For responsive UI implementations, this technique is often used to prevent blocking of the UI by another operation.
As for being pointless or not, it's a use case scenario. I've faced situations where that could have been the solution, but found that redesigning my program logic so that if I need to use a thread from within a task, then what I do is to have two tasks instead of one task plus the inner thread, gave me a cleaner, and more readable code structure, but that it's just personal flair.
As a final note, here are some links to official documentation and another post regarding multithreaded programming in C#:
Async in Depth
Task based asynchronous programming
Chaining Tasks using Continuation Tasks
Start multiple async Tasks and process them as they complete
Should one use Task.Run within another Task
It depends how you use tasks and what your reason is for wanting another thread.
Task.Run
If you use Task.Run, the work will "run on the ThreadPool". It will be done on a different thread than the one you call it from. This is useful in a desktop application where you have a long-running processor-intensive operation that you just need to get off the UI thread.
The difference is that you don't have a handle to the thread, so you can't control that thread in any way (suspend, resume, kill, reuse, etc.). Essentially, you use Task.Run when you don't care which thread the work happens on, as long as it's not the current one.
So if you use Task.Run to start a task, there's nothing stopping you from starting a new thread within, if you know why you're doing it. You could pass the thread handle between tasks if you specifically want to reuse it for a specific purpose.
Async methods
Methods that use async and await are used for operations that use very little processing time, but have I/O operations - operations that require waiting. For example, network requests, read/writing local storage, etc. Using async and await means that the thread is free to do other things while you wait for a response. The benefits depend on the type of application:
Desktop app: The UI thread will be free to respond to user input while you wait for a response. I'm sure you've seen some programs that totally freeze while waiting for a response from something. This is what asynchronous programming helps you avoid.
Web app: The current thread will be freed up to do any other work required. This can include serving other incoming requests. The result is that your application can handle a bigger load than it could if you didn't use async and await.
There is nothing stopping you from starting a thread inside an async method too. You might want to move some processor-intensive work to another thread. But in that case you could use Task.Run too. So it all depends on why you want another thread.
It would be pointless in most cases of everyday programming.
There are situations where you would create threads.
What is the difference between Task class and parallel class which part of TPL at implementation point of view.?
I believe task class is having more benefits than threadpool and thread but still context switch happens in task class as well.
But parallel class is basically design to run program on multicore processor?
Your question is extremely wide and can contain lots of details as an answer, but let me restrict to specific details.
Task - Wrap a method for execution down the line, it use the Lambda (Action, Func Delegate) to do the same. You can wrap now and execute anytime later.
Parallel is an API which helps achieve the Data Parallelization, where you can divide a collection (IEnumerable type) into smaller chunks and each can be executed in parallel and finally aggregated to achieve the result
There are broadly two kinds of parallelism, in one you can subdivide the bigger task into smaller ones, wrap them in a Task type and wait for all or some of them to complete in parallel. This is task parallelism
In other one you take each data unit in a collection and work on it in a mutually exclusive manner, which is data parallelism achieved by Parallel.forEach or Parallel.For APIs
These are introduced from .Net 4.0 onward to make the parallelism easy for the developer, else we had to dabble with Thread and ThreadPool class, which require much more in-depth understanding of the working of threads, here lot of complexity is taken care of internally.
However, don't be under the impression that current mechanism doesn't use threads, both the above mentioned form of parallelism rely completely on ThreadPool threads, that's why we have all the stuff like context -switching happening, multiple threads getting invoked, just that microsoft has made developer life easy by doing it
You may want to go through following links for a better understanding, let me know if there's still a specific query:
Parallel.ForEach vs Task.Factory.StartNew
Past, Present and Future of Parallelism
Parallel.ForEach vs Task.Run and Task.WhenAll
TPL is designed to minimize pre-emptive context-switching (caused by thread oversubscription – having more threads than cores). Task abstractions, of which TPL is an implementation, are designed for cooperative parallelism, where the developer controls when a task will relinquish its execution (typically upon completion). If you schedule more tasks than you have cores, TPL will only execute concurrently approximately as many tasks as you have core; the rest will be queued. This promotes throughout since it avoids the overheads of context-switching, but reduces responsiveness, as each task may take longer to start being processed.
The Parallel class is yet a higher level of abstraction that builds on top of TPL. Implementation-wise, Parallel generates a graph of tasks, but can use heuristics for deciding the granularity of the said tasks depending on your work.
I've been reading this article on MSDN about C# Concurrent Collections. It talks about the optimum threading to use for particular scenarios to get the most benefit out of the collections e.g:
ConcurrentQueue performs best when one dedicated thread is queuing and one dedicated thread is de-queuing. If you do not enforce this rule, then Queue might even perform slightly faster than ConcurrentQueue on computers that have multiple cores.
Is this advice still valid when one is using Tasks instead of raw Threads? From my (limited) understanding of C# Tasks, there is no guarantee that a particular Task will always run on the same thread between context switches, or does maintaining the stack frame mean that the same rules apply in terms of best usage?
Thanks.
One task always runs on the same thread. TPL is a user-mode library. User mode has no (practical) way of migrating executing code from thread to thread. Also there would be no point to doing that.
This advice applies exactly to tasks as it does to threads.
What that piece of advice means to say is that at the same time there should be one producer and one consumer only. You can have 100 threads enqueuing from time to time as long as they do not contend.
(I'm not questioning that advice here since that is out of scope for this question. But that is what's meant here.)
I have used most of the Threading library extensively. I am fairly familiar with creating new Threads, creating BackgroundWorkers and using the built-in .NET ThreadPool (which are all very cool).
However, I have never found a reason to use the Task class. I have seen maybe one or two examples of people using them, but the examples weren't very clear and they didn't give a high-level overview of why one should use a task instead of a new thread.
Question 1: From a high-level, when is using a task useful versus one of the other methods for parallelism in .NET?
Question 2: Does anyone have a simple and/or medium difficulty example demonstrating how to use tasks?
There are two main advantages in using Tasks:
Task can represent any result that will be available in the future (the general concept is not specific to .Net and it's called future), not just a computation. This is especially important with async-await, which uses Tasks for asynchronous operations. Since the operation that gets the result might fail, Tasks can also represent failures.
Task has lots of methods to operate on them. You can synchronously wait until it finishes (Wait()), wait for its result (Result), set up some operation when the Task finishes (ContinueWith()) and also some methods that work on several Tasks (WaitAll(), WaitAny(), ContinueWhenAll()). All of this is possible using other parallel processing methods, but you would have to do it manually.
And there are also some smaller advantages to using Task:
You can use a custom TaskScheduler to decide when and where will the Task run. This can be useful for example if you want to run a Task on the UI thread, limit the degree of parallelism or have a Task-level readers–writer lock.
Tasks support cooperative cancellation through CancellationToken.
Tasks that represent computations have some performance improvements. For example, they use work-stealing queue for more efficient processing and they also support inlining (executing Task that hasn't started yet on a thread that synchronously waits for it).
From what I understand about the difference between Task & Thread is that task happened in the thread-pool while the thread is something that I need to managed by myself .. ( and that task can be cancel and return to the thread-pool in the end of his mission )
But in some blog I read that if the operating system need to create task and create thread => it will be easier to create ( and destroy ) task.
Someone can explain please why creating task is simple that thread ?
( or maybe I missing something here ... )
I think that what you are talking about when you say Task is a System.Threading.Task. If that's the case then you can think about it this way:
A program can have many threads, but a processor core can only run one Thread at a time.
Threads are very expensive, and switching between the threads that are running is also very expensive.
So... Having thousands of threads doing stuff is inefficient. Imagine if your teacher gave you 10,000 tasks to do. You'd spend so much time cycling between them that you'd never get anything done. The same thing can happen to the CPU if you start too many threads.
To get around this, the .NET framework allows you to create Tasks. Tasks are a bit of work bundled up into an object, and they allow you to do interesting things like capture the output of that work and chain pieces of work together (first go to the store, then buy a magazine).
Tasks are scheduled on a pool of threads. The specific number of threads depends on the scheduler used, but the default scheduler tries to pick a number of threads that is optimal for the number of CPU cores that you have and how much time your tasks are spending actually using CPU time. If you want to, you can even write your own scheduler that does something specific like making sure that all Tasks for that scheduler always operate on a single thread.
So think of Tasks as items in your to-do list. You might be able to do 5 things at once, but if your boss gives you 10000, they will pile up in your inbox until the first 5 that you are doing get done. The difference between Tasks and the ThreadPool is that Tasks (as I mentioned earlier) give you better control over the relationship between different items of work (imagine to-do items with multiple instructions stapled together), whereas the ThreadPool just allows you to queue up a bunch of individual, single-stage items (Functions).
You are hearing two different notions of task. The first is the notion of a job, and the second is the notion of a process.
A long time ago (in computer terms), there were no threads. Each running instance of a program was called a process, since it simply performed one step after another after another until it exited. This matches the intuitive idea of a process as a series of steps, like that of a factory assembly line. The operating system manages the process abstraction.
Then, developers began to add multiple assembly lines to the factories. Now a program could do more than one thing at once, and either a library or (more commonly today) the operating system would manage the scheduling of the steps within each thread. A thread is kind of a lightweight process, but a thread belongs to a process, and all the threads in a process share memory. On the other hand, multiple processes can't mess with each others' memory. So, the multiple threads in your web server can each access the same information about the connection, but Word can't access Excel's in-memory data structures because Word and Excel are running as separate processes. The idea of a process as a series of steps doesn't really match the model of a process with threads, so some people took to calling the "abstraction formerly known as a process" a task. This is the second definition of task that you saw in the blog post. Note that plenty of people still use the word process to mean this thing.
Well, as threads became more commmon, developers added even more abstractions over top of them to make them easier to use. This led to the rise of the thread pool, which is a library-managed "pool" of threads. You pass the library a job, and the library picks a thread and runs the job on that thread. The .NET framework has a thread pool implementation, and the first time you heard about a "task" the documentation really meant a job that you pass to the thread pool.
So in a sense, both the documentation and the blog post are right. The overloading of the term task is the unfortunate source of confusion.
Threads have been a part of .Net from v1.0, Tasks were introduced in the Task Parallel Library TPL which was released in .Net 4.0.
You can consider a Task as a more sophisticated version of a Thread. They are very easy to use and have a lot of advantages over Threads as follows:
You can create return types to Tasks as if they are functions.
You can the "ContinueWith" method, which will wait for the previous task and then start the execution. (Abstracting wait)
Abstracts Locks which should be avoided as per guidlines of my company.
You can use Task.WaitAll and pass an array of tasks so you can wait till all tasks are complete.
You can attach task to the parent task, thus you can decide whether the parent or the child will exist first.
You can achieve data parallelism with LINQ queries.
You can create parallel for and foreach loops
Very easy to handle exceptions with tasks.
*Most important thing is if the same code is run on single core machine it will just act as a single process without any overhead of threads.
Disadvantage of tasks over threads:
You need .Net 4.0
Newcomers who have learned operating systems can understand threads better.
New to the framework so not much assistance available.
Some tips:-
Always use Task.Factory.StartNew method which is semantically perfect and standard.
Take a look at Task Parallel Libray for more information
http://msdn.microsoft.com/en-us/library/dd460717.aspx
Expanding on the comment by Eric Lippert:
Threads are a way that allows your application to do several things in parallel. For example, your application might have one thread that processes the events from the user, like button clicks, and another thread that performs some long computation. This way, you can do two different things “at the same time”. If you didn't do that, the user wouldn't be to click buttons until the computation finished. So, Thread is something that can execute some code you wrote.
Task, on the other hand represents an abstract notion of some job. That job can have a result, and you can wait until the job finishes (by calling Wait()) or say that you want to do something after the job finishes (by calling ContinueWith()).
The most common job that you want to represent is to perform some computation in parallel with the current code. And Task offers you a simple way to do that. How and when the code actually runs is defined by TaskScheduler. The default one uses a ThreadPool: a set of threads that can run any code. This is done because creating and switching threads in inefficient.
But Task doesn't have to be directly associated with some code. You can use TaskCompletionSource to create a Task and then set its result whenever you want. For example, you could create a Task and mark it as completed when the user clicks a button. Some other code could wait on that Task and while it's waiting, there is no code executing for that Task.
If you want to know when to use Task and when to use Thread: Task is simpler to use and more efficient that creating your own Threads. But sometimes, you need more control than what is offered by Task. In those cases, it makese sense to use Thread directly.
Tasks really are just a wrapper for the boilerplate code of spinning up threads manually. At the root, there is no difference. Tasks just make the management of threads easier, as well as they are generally more expressive due to the lessening of the boilerplate noise.