I've been reading up on multithreading to get something more than the regular "push functions to the threadpool and wait for it to finish" approach which is really basic.
Basically, I want more control over the threads, the ability to pass on Cancelation tokens, get return values, etc. This all looks possible with the use of Task.factory (Task Scheduler), which from what i understand runs on top of the threadpool.
If that's the case, if I limit the thread number on the general threadpool, that will apply to my implementation of Task Scheduler or?
I also read that using your own threadpool is better than THE threadpool, can I mix these two up and get the control I want?
Any suggestions are welcome! Thanks for taking the time to explain a bit more guys.
You can create a TaskScheduler that limits concurrency. This custom scheduler can then be used to create your own TaskFactory, and start tasks that are customized with the control you wish.
The Parallel Extensions Samples project includes many custom task schedulers you can use as reference.
I also read that using your own threadpool is better than THE threadpool, can I mix these two up and get the control I want?
I would actually disagree with this, for most general uses. The .NET ThreadPool is very efficient, and highly optimized. It includes quite a few metrics for automaticallly scaling the number of threads used, etc.
That being said, you can always make a TaskScheduler which uses dedicated threads or your own "thread pool" implementation if you choose.
Related
What is the difference between Task class and parallel class which part of TPL at implementation point of view.?
I believe task class is having more benefits than threadpool and thread but still context switch happens in task class as well.
But parallel class is basically design to run program on multicore processor?
Your question is extremely wide and can contain lots of details as an answer, but let me restrict to specific details.
Task - Wrap a method for execution down the line, it use the Lambda (Action, Func Delegate) to do the same. You can wrap now and execute anytime later.
Parallel is an API which helps achieve the Data Parallelization, where you can divide a collection (IEnumerable type) into smaller chunks and each can be executed in parallel and finally aggregated to achieve the result
There are broadly two kinds of parallelism, in one you can subdivide the bigger task into smaller ones, wrap them in a Task type and wait for all or some of them to complete in parallel. This is task parallelism
In other one you take each data unit in a collection and work on it in a mutually exclusive manner, which is data parallelism achieved by Parallel.forEach or Parallel.For APIs
These are introduced from .Net 4.0 onward to make the parallelism easy for the developer, else we had to dabble with Thread and ThreadPool class, which require much more in-depth understanding of the working of threads, here lot of complexity is taken care of internally.
However, don't be under the impression that current mechanism doesn't use threads, both the above mentioned form of parallelism rely completely on ThreadPool threads, that's why we have all the stuff like context -switching happening, multiple threads getting invoked, just that microsoft has made developer life easy by doing it
You may want to go through following links for a better understanding, let me know if there's still a specific query:
Parallel.ForEach vs Task.Factory.StartNew
Past, Present and Future of Parallelism
Parallel.ForEach vs Task.Run and Task.WhenAll
TPL is designed to minimize pre-emptive context-switching (caused by thread oversubscription – having more threads than cores). Task abstractions, of which TPL is an implementation, are designed for cooperative parallelism, where the developer controls when a task will relinquish its execution (typically upon completion). If you schedule more tasks than you have cores, TPL will only execute concurrently approximately as many tasks as you have core; the rest will be queued. This promotes throughout since it avoids the overheads of context-switching, but reduces responsiveness, as each task may take longer to start being processed.
The Parallel class is yet a higher level of abstraction that builds on top of TPL. Implementation-wise, Parallel generates a graph of tasks, but can use heuristics for deciding the granularity of the said tasks depending on your work.
After reading how the thread pool and tasks work in this article I came up with this question -
If I have a complex program in which some modules use tasks and some use thread pool, is it possible that there will be some scheduling problems due to the different uses?
Task are often implemented using the thread pool (one can of course also have tasks using other types of schedulers that give different behavior, but this is the default). In terms of the actual code being executed (assuming your tasks are representing delegates being run) there really isn't much difference.
Tasks are simply creating a wrapper around that thread pool call to provide additional functionality when it comes to gather information about, and processing the results of, that asynchronous operation. If you want to leverage that additional functionality then use tasks. If you have no need to use it in some particular context, there's nothing wrong with using the thread pool directly.
Mix the two, so long as you don't have trouble getting what you want out of the results of those operations, is not a problem at all.
No. And there actually isn't much in the way of memory or performance inefficiencies when mixing approaches; by default tasks use the same thread pool that thread pool threads use.
The only significant disadvantage of mixing both is lack of consistency in your codebase. If you were to pick one, I would use TPL since it is has a rich API for handling many aspects of multi-threading and takes advantage of async/await language features.
Since your usage is divided down module lines, you don't have much to worry about.
No, there wouldn't be problems - you just would be inefficient in doing both. use what is really needed and stick with the pattern. Remember to be sure that you make your app MT Safe also especially if you are accessing the same resources/variables etc... from different threads, regardless of which threading algorithm you use.
There shouldn't be any scheduling problems as such, but of course it's better to use Tasks and let the Framework decide what to do with the scheduled work. In the current version of the framework (4.5) the work will be queued through the ThreadPool unless the LongRunning option is used, but this behaviour may change in future of course.
Verdict: Mixing Tasks and ThreadPool isn't a problem, but for new applications it's recommended to use Tasks instead of queueing work items directly on the ThreadPool (one reason for that is ThreadPool isn't available in Windows 8 Runtime (Modern UI apps).
I have used most of the Threading library extensively. I am fairly familiar with creating new Threads, creating BackgroundWorkers and using the built-in .NET ThreadPool (which are all very cool).
However, I have never found a reason to use the Task class. I have seen maybe one or two examples of people using them, but the examples weren't very clear and they didn't give a high-level overview of why one should use a task instead of a new thread.
Question 1: From a high-level, when is using a task useful versus one of the other methods for parallelism in .NET?
Question 2: Does anyone have a simple and/or medium difficulty example demonstrating how to use tasks?
There are two main advantages in using Tasks:
Task can represent any result that will be available in the future (the general concept is not specific to .Net and it's called future), not just a computation. This is especially important with async-await, which uses Tasks for asynchronous operations. Since the operation that gets the result might fail, Tasks can also represent failures.
Task has lots of methods to operate on them. You can synchronously wait until it finishes (Wait()), wait for its result (Result), set up some operation when the Task finishes (ContinueWith()) and also some methods that work on several Tasks (WaitAll(), WaitAny(), ContinueWhenAll()). All of this is possible using other parallel processing methods, but you would have to do it manually.
And there are also some smaller advantages to using Task:
You can use a custom TaskScheduler to decide when and where will the Task run. This can be useful for example if you want to run a Task on the UI thread, limit the degree of parallelism or have a Task-level readers–writer lock.
Tasks support cooperative cancellation through CancellationToken.
Tasks that represent computations have some performance improvements. For example, they use work-stealing queue for more efficient processing and they also support inlining (executing Task that hasn't started yet on a thread that synchronously waits for it).
I have created a renderer in Silverlight/C#. Currently I'm using System.Threading.ThreadPool to schedule rendering of tiles in parallel. This works well right now, but I would like to limit the number of threads used.
Since this runs on Silverlight there are a couple of restrictions:
If I call ThreadPool.SetMaxThreads the application crashes as documented.
There is no Task Parallel Library
I see a few options:
Find an OSS/third party Thread Pool
Implement my own Thread Pool (I'd rather not)
Use Rx (which I do in other places)
Are there any tested alternative Thread Pools that work with Silverlight out there?
Or can anyone come up with a Rx expression that spawns a limited number of threads and queue work on these?
If you're using Rx, check out:
https://github.com/xpaulbettsx/ReactiveUI/blob/master/ReactiveUI/ObservableAsyncMRUCache.cs
(Copying this one file into your app should be pretty easy, just nuke the this.Log() lines and the IEnableLogger interface)
Using it is pretty easy, just change your SelectMany to CachedSelectMany:
someArray.ToObservable()
.CachedSelectMany(webService)
.Subscribe(x => /* do stuff */);
If you use Rx then it seems like you could quite easily write your own implementation of IScheduler. This could just apply a simple semaphore and then pass the work on to the ThreadPool. With this approach you get to leaverage the ThreadPool, allow for testing as you are coding against an interface and you will also have good seams for testing.
Further more, as you have written this yourself, you could actually use a small-ish (<10)set of Threads that you manage yourself (instead of the threadpool)so you can avoid ThreadPool starvation.
Check out Ami Bar's SmartThreadPool. It's got a ton of features missing from the default .NET threadpool, allows you to set a MaxThreads property per threadpool instance, and supports Silverlight.
I came across this comprehensive explanation of the new .NET TPL library recently, and it sounded pretty impressive. Having read the article, it appears that the new taskmanager is so clever it can even tell whether your parallel tasks would be faster if done serially on the same thread, rather than be parcelled out to worker threads. This could often be a difficult decision.
Having written a lot of code using what threading was available previously, it now seems as though everything ought to be written with tasks, which would hand over a lot of the work to the taskmanager.
Am I right in thinking that whatever I previously did with threads should now be done with tasks? Of course there will always be cases where you need fine control, but should one generally throw ordinary background work onto a task, rather than a new thread? Ie has the default "I need this to run in the background => new thread" become "new task" instead?
Basically, yes, you want to use tasks and let them take care of the thread use. In practice, the tasks are processed by a thread pool.
Tasks are managed by the TaskScheduler. The default TaskScheduler runs tasks on ThreadPool threads and as such you have the same issues as you normally would when using the ThreadPool: It is hard to control the setup (priority, locale, background/foreground, etc.) on threads in the pool. If you need to control any of these aspects it may be better to manage the threads yourself. You may also implement your own scheduler to handle some of these issues.
For most other parts the new Task class works very well.