I have used most of the Threading library extensively. I am fairly familiar with creating new Threads, creating BackgroundWorkers and using the built-in .NET ThreadPool (which are all very cool).
However, I have never found a reason to use the Task class. I have seen maybe one or two examples of people using them, but the examples weren't very clear and they didn't give a high-level overview of why one should use a task instead of a new thread.
Question 1: From a high-level, when is using a task useful versus one of the other methods for parallelism in .NET?
Question 2: Does anyone have a simple and/or medium difficulty example demonstrating how to use tasks?
There are two main advantages in using Tasks:
Task can represent any result that will be available in the future (the general concept is not specific to .Net and it's called future), not just a computation. This is especially important with async-await, which uses Tasks for asynchronous operations. Since the operation that gets the result might fail, Tasks can also represent failures.
Task has lots of methods to operate on them. You can synchronously wait until it finishes (Wait()), wait for its result (Result), set up some operation when the Task finishes (ContinueWith()) and also some methods that work on several Tasks (WaitAll(), WaitAny(), ContinueWhenAll()). All of this is possible using other parallel processing methods, but you would have to do it manually.
And there are also some smaller advantages to using Task:
You can use a custom TaskScheduler to decide when and where will the Task run. This can be useful for example if you want to run a Task on the UI thread, limit the degree of parallelism or have a Task-level readers–writer lock.
Tasks support cooperative cancellation through CancellationToken.
Tasks that represent computations have some performance improvements. For example, they use work-stealing queue for more efficient processing and they also support inlining (executing Task that hasn't started yet on a thread that synchronously waits for it).
Related
I am new to TAP and TPL in C#, After googling and reading some material I am not able to understand how async tasks are executed.(particularly on which thread)
Please help me to understand out of following two options which one is correct:
If async tasks are executed on the same thread on which they are invoked, then what if I create 1000 tasks and they all will be executed by time slicing on same thread? if it is not so and tasks execute on different threads then how is it different from multi-threading (parallel execution)?
If async tasks are executed on the same thread on which they are invoked, then what if I create 1000 tasks and they all will be executed by time slicing on same thread?
Since async tasks are not executed on the same thread on which they are invoked, it is meaningless to consider whether they will all be executed in some specific manner. In any case, there is no mechanism in .NET to "time-slice" within a single thread. Time-slicing is done at the granularity of a thread; i.e. CPU time is granted to threads, not subsets of threads.
In some scenarios involving async methods, it is possible that a given thread may service two or more tasks. This is inherent in the nature of thread pools, which are used extensively throughout the TPL API. But the only reason that such methods would be serviced in any sense of the word "concurrently" would be if they themselves are waiting on other asynchronous operations. In those cases, they can yield the thread cooperatively, at which point some other task might make use of it.
if it is not so and tasks execute on different threads then how is it different from multi-threading (parallel execution)?
async and Task are super-sets of "parallel execution". That is, some tasks execute in parallel. But async and Task are also used for operations that don't involve execution at all per se, but rather just waiting for some asynchronous event to occur outside the CPU (e.g. network or file I/O).
Computational tasks will be assigned an individual thread, for that task to use for the duration of the task. Other types of asynchronous operations will not use a thread at all.
For computational tasks, these are not different from multi-threading on a fundamental level. Like many things in programming, it's more that async and Task offer useful abstractions that allow one to write code more easily to take advantage of multi-threading features in the computer, and at the same time also represent non-computational operations that are otherwise semantically similar to concurrent computational operations.
I have done my best to answer your questions as directly and concisely as possible. I would not be surprised however if the above raises as many new questions as it's answered old. Unfortunately, that's the nature of potentially broad questions. I do hope that the above has satisfied your curiosity, but if not that would mean that your original question is too broad for Stack Overflow. Please consider rewriting your question so that it focuses more specifically on an issue; include a good, minimal, complete code example for context, so that the question can be addressed in a specific manner and without ambiguity.
What is the difference between Task class and parallel class which part of TPL at implementation point of view.?
I believe task class is having more benefits than threadpool and thread but still context switch happens in task class as well.
But parallel class is basically design to run program on multicore processor?
Your question is extremely wide and can contain lots of details as an answer, but let me restrict to specific details.
Task - Wrap a method for execution down the line, it use the Lambda (Action, Func Delegate) to do the same. You can wrap now and execute anytime later.
Parallel is an API which helps achieve the Data Parallelization, where you can divide a collection (IEnumerable type) into smaller chunks and each can be executed in parallel and finally aggregated to achieve the result
There are broadly two kinds of parallelism, in one you can subdivide the bigger task into smaller ones, wrap them in a Task type and wait for all or some of them to complete in parallel. This is task parallelism
In other one you take each data unit in a collection and work on it in a mutually exclusive manner, which is data parallelism achieved by Parallel.forEach or Parallel.For APIs
These are introduced from .Net 4.0 onward to make the parallelism easy for the developer, else we had to dabble with Thread and ThreadPool class, which require much more in-depth understanding of the working of threads, here lot of complexity is taken care of internally.
However, don't be under the impression that current mechanism doesn't use threads, both the above mentioned form of parallelism rely completely on ThreadPool threads, that's why we have all the stuff like context -switching happening, multiple threads getting invoked, just that microsoft has made developer life easy by doing it
You may want to go through following links for a better understanding, let me know if there's still a specific query:
Parallel.ForEach vs Task.Factory.StartNew
Past, Present and Future of Parallelism
Parallel.ForEach vs Task.Run and Task.WhenAll
TPL is designed to minimize pre-emptive context-switching (caused by thread oversubscription – having more threads than cores). Task abstractions, of which TPL is an implementation, are designed for cooperative parallelism, where the developer controls when a task will relinquish its execution (typically upon completion). If you schedule more tasks than you have cores, TPL will only execute concurrently approximately as many tasks as you have core; the rest will be queued. This promotes throughout since it avoids the overheads of context-switching, but reduces responsiveness, as each task may take longer to start being processed.
The Parallel class is yet a higher level of abstraction that builds on top of TPL. Implementation-wise, Parallel generates a graph of tasks, but can use heuristics for deciding the granularity of the said tasks depending on your work.
I'm trying to understand tasks in .net from what I understand is that they are better than threads because they represent work that needs to get done and when there is a idle thread it just gets picked up and worked on allowing the full cpu to be utilized.
I see the Task<ActionResult> all over a new mvc 5 project and I would like to know why this is happening?
Does it make sense to always do this, or just when there can be blocking work in the function?
I'm guessing since this does act like a thread there is still sync objects that may be needed is this correct?
MVC 5 uses Task<ActionResult> to allow it to be fully asynchronous. By using Task<T>, the methods can be implemented using the new async and await language features, which allows you to compose asynchronous IO functions with MVC in a simple manner.
When working with MVC, in general, the Task<T> will hopefully not be using threads - they'll be composing asynchronous operations (typically IO bound work). Using threads on a server, in general, will reduce your overall scalability.
A Task does not represent a thread, even logically. It's not just an alternate implementation of threads. It's a higher level concept. A Task is the representation of an asynchronous operation that will complete at some point (usually in the future).
That task could represent code being run on another thread, it could represent some asynchronous IO operation that relies on OS interrupts to (indirectly, through a few other layers of indirection) cause the task to be marked completed), it could be the result of two other tasks being completed, or the continuation of some other task being completed, it could be an indication of when an event next fires, or some custom TaskCompletionSource that has who knows what as its implementation.
But you don't need to worry about all of those options. That's the point. In other models you need to treat all of those different types of asynchronous operations differently, complicating your asynchronous programs. The use of Task allows you to write code that can easily be composed with any and every type of asynchronous operation.
I'm guessing since this does act like a thread there is still sync objects that may be needed is this correct?
Technically, yes. There are times where you may need to use these, but largely, no. Ideally, if you're using idiomatic practices, you can avoid this, at least in most cases. Generally when one task depends on code running in other tasks it should be the continuation of that task, and information is assessed between tasks through the tasks' Result property. The use of Result doesn't require any synchronization mechanisms, so usually you can avoid them entirely.
I see the Task all over a new mvc 5 project and I would like to know why this is happening?
When you're going to make something asynchronous it generally makes sense to make everything asynchronous (or nothing). Mixing and matching just...doesn't work. Asynchronous code relies on having every method take very little time to execute so that the message pump can get back to processing its queue of pending tasks/continuations. Mixing asynchronous code and synchronous code makes it very likely to deadlock your application, and also defeats most of the purposes of using asynchrony to begin with (which is to avoid blocking threads).
I've been reading up on multithreading to get something more than the regular "push functions to the threadpool and wait for it to finish" approach which is really basic.
Basically, I want more control over the threads, the ability to pass on Cancelation tokens, get return values, etc. This all looks possible with the use of Task.factory (Task Scheduler), which from what i understand runs on top of the threadpool.
If that's the case, if I limit the thread number on the general threadpool, that will apply to my implementation of Task Scheduler or?
I also read that using your own threadpool is better than THE threadpool, can I mix these two up and get the control I want?
Any suggestions are welcome! Thanks for taking the time to explain a bit more guys.
You can create a TaskScheduler that limits concurrency. This custom scheduler can then be used to create your own TaskFactory, and start tasks that are customized with the control you wish.
The Parallel Extensions Samples project includes many custom task schedulers you can use as reference.
I also read that using your own threadpool is better than THE threadpool, can I mix these two up and get the control I want?
I would actually disagree with this, for most general uses. The .NET ThreadPool is very efficient, and highly optimized. It includes quite a few metrics for automaticallly scaling the number of threads used, etc.
That being said, you can always make a TaskScheduler which uses dedicated threads or your own "thread pool" implementation if you choose.
I've a configuration xml which is being used by a batch module in my .Net 3.5 windows application.
Each node in the xml is mapped to a .Net class. Each class does processing like mathematical calculations, making db calls etc.
The batch module loads the xml, identifies the class associated with each node and then processes it.
Now, we have the following requirements:
1.Lets say there are 3 classes[3 nodes in the xml]...A,B, and C.
Class A can be dependant on class B...ie. we need to execute class B before processing class A. Class C processing should be done on a separare thread.
2.If a thread is running, then we should be able to cancel that thread in the middle of its processing.
We need to implement this whole module using .net multi-threading.
My questions are:
1.Is it possible to implement requirement # 1 above?If yes, how?
2.Given these requirements, is .Net 3.5 a good idea or .Net 4.0 would be a better choice?Would like to know advantages and disadvantages please.
Thanks for reading.
You'd be better off using the Task Parallel Library (TPL) in .NET 4.0. It'll give you lots of nice features for abstracting the actual business of creating threads in the thread pool. You could use the parallel tasks pattern to create a Task for each of the jobs defined in the XML and the TPL will handle the scheduling of those tasks regardless of the hardware. In other words if you move to a machine with more cores the TPL will schedule more threads.
1) The TPL supports the notion of continuation tasks. You can use these to enforce task ordering and pass the result of one Task or future from the antecedent to the continuation. This is the futures pattern.
// The antecedent task. Can also be created with Task.Factory.StartNew.
Task<DayOfWeek> taskA = new Task<DayOfWeek>(() => DateTime.Today.DayOfWeek);
// The continuation. Its delegate takes the antecedent task
// as an argument and can return a different type.
Task<string> continuation = taskA.ContinueWith((antecedent) =>
{
return String.Format("Today is {0}.",
antecedent.Result);
});
// Start the antecedent.
taskA.Start();
// Use the contuation's result.
Console.WriteLine(continuation.Result);
2) Thread cancellation is supported by the TPL but it is cooperative cancellation. In other words the code running in the Task must periodically check to see if it has been cancelled and shut down cleanly. TPL has good support for cancellation. Note that if you were to use threads directly you run into the same limitations. Thread.Abort is not a viable solution in almost all cases.
While you're at it you might want to look at a dependency injection container like Unity for generating configured objects from your XML configuration.
Answer to comment (below)
Jimmy: I'm not sure I understand holtavolt's comment. What is true is that using parallelism only pays off if the amount of work being done is significant, otherwise your program may spend more time managing parallelism that doing useful work. The actual datasets don't have to be large but the work needs to be significant.
For example if your inputs were large numbers and you we checking to see if they were prime then the dataset would be very small but parallelism would still pay off because the computation is costly for each number or block of numbers. Conversely you might have a very large dataset of numbers that you were searching for evenness. This would require a very large set of data but the calculation is still very cheap and a parallel implementation might still not be more efficient.
The canonical example is using Parallel.For instead of for to iterate over a dataset (large or small) but only perform a simple numerical operation like addition. In this case the expected performance improvement of utilizing multiple cores is outweighed by the overhead of creating parallel tasks and scheduling and managing them.
Of course it can be done.
Assuming you're new, I would likely look into multithreading, and you want 1 thread per class then I would look into the backgroundworker class, and basically use it in the different classes to do the processing.
What version you want to use of .NET also depends on if this is going to run on client machines also. But I would go for .NET 4 simply because it's newest, and if you want to split up a single task into multiple threads it has built-in classes for this.
Given your use case, the Thread and BackgroundWorkerThread should be sufficient. As you'll discover in reading the MSDN information regarding these classes, you will want to support cancellation as your means of shutting down a running thread before it's complete. (Thread "killing" is something to be avoided if at all possible)
.NET 4.0 has added some advanced items in the Task Parallel Library (TPL) - where Tasks are defined and managed with some smarter affinity for their most recently used core (to provide better cache behavior, etc.), however this seems like overkill for your use case, unless you expect to be running very large datasets. See these sites for more information:
http://msdn.microsoft.com/en-us/library/dd460717.aspx
http://archive.msdn.microsoft.com/ParExtSamples