I came across this comprehensive explanation of the new .NET TPL library recently, and it sounded pretty impressive. Having read the article, it appears that the new taskmanager is so clever it can even tell whether your parallel tasks would be faster if done serially on the same thread, rather than be parcelled out to worker threads. This could often be a difficult decision.
Having written a lot of code using what threading was available previously, it now seems as though everything ought to be written with tasks, which would hand over a lot of the work to the taskmanager.
Am I right in thinking that whatever I previously did with threads should now be done with tasks? Of course there will always be cases where you need fine control, but should one generally throw ordinary background work onto a task, rather than a new thread? Ie has the default "I need this to run in the background => new thread" become "new task" instead?
Basically, yes, you want to use tasks and let them take care of the thread use. In practice, the tasks are processed by a thread pool.
Tasks are managed by the TaskScheduler. The default TaskScheduler runs tasks on ThreadPool threads and as such you have the same issues as you normally would when using the ThreadPool: It is hard to control the setup (priority, locale, background/foreground, etc.) on threads in the pool. If you need to control any of these aspects it may be better to manage the threads yourself. You may also implement your own scheduler to handle some of these issues.
For most other parts the new Task class works very well.
Related
I'm currently picking up C# again and developping a simple application that sends broadcast messages and when received shown on a Windows Form.
I have a discovery class with two threads, one that broadcasts every 30 seconds, the other thread listens on a socket. It is in a thread because of the blocking call:
if (listenSocket.Poll(-1, SelectMode.SelectRead))
The first thread works much like a timer in a class library, it broadcasts the packet and then sleeps for 30 seconds.
Now in principle it works fine, when a packet is received I throw it to an event and the Winform places it in a list. The problems start with the form though because of the main UI thread requiring Invoke. Now I only have two threads and to me it doesn't seem to be the most effective in the long run becoming a complex thin when the number of threads will grow.
I have explored the Tasks but these seem to be more orientated at a once off long running task (much like the background worker for a form).
Most threading examples I find all report to the console and do not have the problems of Invoke and locking of variables.
As i'm using .NET 4.5 should I move to Tasks or stick to the threads?
Async programming will still delegate some aspects of your application to a different thread (threadpool) if you try to update the GUI from such a thread you are going to have similar problems as you have today with regular threads.
However there are many techniques in async await that allow you to delegate to a background thread, and yet put a kind off wait point saying please continue here on the GUI thread when you are finished with that operation which effectively allows you to update the GUI thread without invoke, and have a responsive GUI. I am talking about configureAwait. But there are other techniques as well.
If you don't know async await mechanism yet, this will take you some investment of your time to learn all these new things. But you'll find it very rewarding.
But it is up to you to decide if you are willing to spend a few days learning and experimenting with a technology that is new to you.
Google around a bit on async await, there are some excellent articles from Stephen Cleary for instance http://blog.stephencleary.com/2012/02/async-and-await.html
Firstly if you're worried about scalability you should probably start off with an approach that scales easily. ThreadPool would work nice. Tasks are based on ThreadPool as well and they allow for a bit more complex situations like tasks/threads firing in a sequence (also based on condition), synchronization etc. In your case (server and client) this seems unneeded.
Secondly it looks to me that you are worried about a bottleneck scenario where more than one thread will try to access a common resource like UI or DB etc. With DBs - don't worry they can handle multiple access well. But in case of UI or other not-multithread-friendly resource you have to manage parallel access yourself. I would suggest something like BlockingCollection which is a nice way to implement "many producers, one consumer" pattern. This way you could have multiple threads adding stuff and just one thread reading it from the collection and passing it on the the single-threaded resource like UI.
BTW, Tasks can also be long running i.e. run loops. Check this documentation.
After reading how the thread pool and tasks work in this article I came up with this question -
If I have a complex program in which some modules use tasks and some use thread pool, is it possible that there will be some scheduling problems due to the different uses?
Task are often implemented using the thread pool (one can of course also have tasks using other types of schedulers that give different behavior, but this is the default). In terms of the actual code being executed (assuming your tasks are representing delegates being run) there really isn't much difference.
Tasks are simply creating a wrapper around that thread pool call to provide additional functionality when it comes to gather information about, and processing the results of, that asynchronous operation. If you want to leverage that additional functionality then use tasks. If you have no need to use it in some particular context, there's nothing wrong with using the thread pool directly.
Mix the two, so long as you don't have trouble getting what you want out of the results of those operations, is not a problem at all.
No. And there actually isn't much in the way of memory or performance inefficiencies when mixing approaches; by default tasks use the same thread pool that thread pool threads use.
The only significant disadvantage of mixing both is lack of consistency in your codebase. If you were to pick one, I would use TPL since it is has a rich API for handling many aspects of multi-threading and takes advantage of async/await language features.
Since your usage is divided down module lines, you don't have much to worry about.
No, there wouldn't be problems - you just would be inefficient in doing both. use what is really needed and stick with the pattern. Remember to be sure that you make your app MT Safe also especially if you are accessing the same resources/variables etc... from different threads, regardless of which threading algorithm you use.
There shouldn't be any scheduling problems as such, but of course it's better to use Tasks and let the Framework decide what to do with the scheduled work. In the current version of the framework (4.5) the work will be queued through the ThreadPool unless the LongRunning option is used, but this behaviour may change in future of course.
Verdict: Mixing Tasks and ThreadPool isn't a problem, but for new applications it's recommended to use Tasks instead of queueing work items directly on the ThreadPool (one reason for that is ThreadPool isn't available in Windows 8 Runtime (Modern UI apps).
I've read about advantages of Tasks Difference between Task (System.Threading.Task) and Thread
Also msdn says that "...in the .NET Framework 4, tasks are the preferred API for writing multi-threaded, asynchronous, and parallel code."
Now my program contains such code which receive multicast data from udp:
thread = new Thread(WhileTrueFunctionToReceiveDataFromUdp);
.....
thread.Start();
I have several such threads for each socket.
Am I better to replace this code to use Task?
It depends on what you're doing - if you're not going to use any of the new features in Task and the TPL, and your existing code works, there's no reason to change.
However, Task has many advantages - especially for operations that you want to run in a thread pool thread and return a result.
Also - given that you're using "threads for each socket", you likely will have longer life threads. As such, if you do switch to Task.Factory.StartNew, you'll potentially want to specify that the tasks should be LongRunning or you'll wind up using a lot of ThreadPool threads for your socket data (with the default scheduler).
Do not change anything in the code that already works and will work (at least according to Microsoft). Change it only for reasons like :
You want to use a new features offered by Tasks
Personal study.
Remember that on OS level they basically end up into the same OS Kernel objects.
Hope this helps.
From what I understand about the difference between Task & Thread is that task happened in the thread-pool while the thread is something that I need to managed by myself .. ( and that task can be cancel and return to the thread-pool in the end of his mission )
But in some blog I read that if the operating system need to create task and create thread => it will be easier to create ( and destroy ) task.
Someone can explain please why creating task is simple that thread ?
( or maybe I missing something here ... )
I think that what you are talking about when you say Task is a System.Threading.Task. If that's the case then you can think about it this way:
A program can have many threads, but a processor core can only run one Thread at a time.
Threads are very expensive, and switching between the threads that are running is also very expensive.
So... Having thousands of threads doing stuff is inefficient. Imagine if your teacher gave you 10,000 tasks to do. You'd spend so much time cycling between them that you'd never get anything done. The same thing can happen to the CPU if you start too many threads.
To get around this, the .NET framework allows you to create Tasks. Tasks are a bit of work bundled up into an object, and they allow you to do interesting things like capture the output of that work and chain pieces of work together (first go to the store, then buy a magazine).
Tasks are scheduled on a pool of threads. The specific number of threads depends on the scheduler used, but the default scheduler tries to pick a number of threads that is optimal for the number of CPU cores that you have and how much time your tasks are spending actually using CPU time. If you want to, you can even write your own scheduler that does something specific like making sure that all Tasks for that scheduler always operate on a single thread.
So think of Tasks as items in your to-do list. You might be able to do 5 things at once, but if your boss gives you 10000, they will pile up in your inbox until the first 5 that you are doing get done. The difference between Tasks and the ThreadPool is that Tasks (as I mentioned earlier) give you better control over the relationship between different items of work (imagine to-do items with multiple instructions stapled together), whereas the ThreadPool just allows you to queue up a bunch of individual, single-stage items (Functions).
You are hearing two different notions of task. The first is the notion of a job, and the second is the notion of a process.
A long time ago (in computer terms), there were no threads. Each running instance of a program was called a process, since it simply performed one step after another after another until it exited. This matches the intuitive idea of a process as a series of steps, like that of a factory assembly line. The operating system manages the process abstraction.
Then, developers began to add multiple assembly lines to the factories. Now a program could do more than one thing at once, and either a library or (more commonly today) the operating system would manage the scheduling of the steps within each thread. A thread is kind of a lightweight process, but a thread belongs to a process, and all the threads in a process share memory. On the other hand, multiple processes can't mess with each others' memory. So, the multiple threads in your web server can each access the same information about the connection, but Word can't access Excel's in-memory data structures because Word and Excel are running as separate processes. The idea of a process as a series of steps doesn't really match the model of a process with threads, so some people took to calling the "abstraction formerly known as a process" a task. This is the second definition of task that you saw in the blog post. Note that plenty of people still use the word process to mean this thing.
Well, as threads became more commmon, developers added even more abstractions over top of them to make them easier to use. This led to the rise of the thread pool, which is a library-managed "pool" of threads. You pass the library a job, and the library picks a thread and runs the job on that thread. The .NET framework has a thread pool implementation, and the first time you heard about a "task" the documentation really meant a job that you pass to the thread pool.
So in a sense, both the documentation and the blog post are right. The overloading of the term task is the unfortunate source of confusion.
Threads have been a part of .Net from v1.0, Tasks were introduced in the Task Parallel Library TPL which was released in .Net 4.0.
You can consider a Task as a more sophisticated version of a Thread. They are very easy to use and have a lot of advantages over Threads as follows:
You can create return types to Tasks as if they are functions.
You can the "ContinueWith" method, which will wait for the previous task and then start the execution. (Abstracting wait)
Abstracts Locks which should be avoided as per guidlines of my company.
You can use Task.WaitAll and pass an array of tasks so you can wait till all tasks are complete.
You can attach task to the parent task, thus you can decide whether the parent or the child will exist first.
You can achieve data parallelism with LINQ queries.
You can create parallel for and foreach loops
Very easy to handle exceptions with tasks.
*Most important thing is if the same code is run on single core machine it will just act as a single process without any overhead of threads.
Disadvantage of tasks over threads:
You need .Net 4.0
Newcomers who have learned operating systems can understand threads better.
New to the framework so not much assistance available.
Some tips:-
Always use Task.Factory.StartNew method which is semantically perfect and standard.
Take a look at Task Parallel Libray for more information
http://msdn.microsoft.com/en-us/library/dd460717.aspx
Expanding on the comment by Eric Lippert:
Threads are a way that allows your application to do several things in parallel. For example, your application might have one thread that processes the events from the user, like button clicks, and another thread that performs some long computation. This way, you can do two different things “at the same time”. If you didn't do that, the user wouldn't be to click buttons until the computation finished. So, Thread is something that can execute some code you wrote.
Task, on the other hand represents an abstract notion of some job. That job can have a result, and you can wait until the job finishes (by calling Wait()) or say that you want to do something after the job finishes (by calling ContinueWith()).
The most common job that you want to represent is to perform some computation in parallel with the current code. And Task offers you a simple way to do that. How and when the code actually runs is defined by TaskScheduler. The default one uses a ThreadPool: a set of threads that can run any code. This is done because creating and switching threads in inefficient.
But Task doesn't have to be directly associated with some code. You can use TaskCompletionSource to create a Task and then set its result whenever you want. For example, you could create a Task and mark it as completed when the user clicks a button. Some other code could wait on that Task and while it's waiting, there is no code executing for that Task.
If you want to know when to use Task and when to use Thread: Task is simpler to use and more efficient that creating your own Threads. But sometimes, you need more control than what is offered by Task. In those cases, it makese sense to use Thread directly.
Tasks really are just a wrapper for the boilerplate code of spinning up threads manually. At the root, there is no difference. Tasks just make the management of threads easier, as well as they are generally more expressive due to the lessening of the boilerplate noise.
As part of trying to learn C#, I'm writing a small app that goes through a list of proxies. For each proxy it will create an httpwebrequest to a proxytest.php which prints generic data about a given proxy (or doesn't, in which case the proxy is discarded)
Clearly the webrequest code needs to run in a separate thread - especially since I'm planning on going through rather large lists. But even on a separate thread, going through 5,000 proxies will take forever, so I think this means I am to create multiple threads (correct me if I'm wrong)
I looked through MSDN and random threading tutorials and there's several different classes available. What's the difference between dispatcher, backgroundworker and parallel? I was given this snippet:
Parallel.ForEach(URLsList, new ParallelOptions() { MaxDegreeOfParallelism = S0 }, (m, i, j) =>
{
string[] UP = m.Split('|');
string User = UP[0];
string Pass = UP[1];
// make call here
}
I'm not really sure how it's different than something like starting 5 separate background workers would do.
So what are the differences between those three and what would be a good (easy) approach to this problem?
Thanks
The Dispatcher is an object that models the message loop of WPF applications. If that doesn't mean anything to you then forget you ever heard of it.
BackgroundWorker is a convenience class over a thread that is part of the managed thread pool. It exists to provide some commonly requested functionality over manually assigning work to the thread pool with ThreadPool.QueueUserWorkItem.
The Thread class is very much like using the managed thread pool, with the difference being that you are in absolute control of the thread's lifetime (on the flip side, it's worse than using the thread pool if you intend to launch lots of short tasks).
The Task Parallel Library (TPL) (i.e. using Parallel.ForEach) would indeed be the best approach, since it not only takes care of assigning work units to a number of threads (from the managed thread pool) but it will also automatically divide the work units among those threads.
I would say use the task parallel library. It is a new library around all the manual threading code you will have to write otherwise.
The Task Parallel Library (TPL) is a collection of new classes specifically designed to make it easier and more efficient to execute very fine-grained parallel workloads on modern hardware. TPL has been available separately as a CTP for some time now, and was included in the Visual Studio 2010 CTP, but in those releases it was built on its own dedicated work scheduler. For Beta 1 of CLR 4.0, the default scheduler for TPL will be the CLR thread pool, which allows TPL-style workloads to “play nice” with existing, QUWI-based code, and allows us to reuse much of the underlying technology in the thread pool - in particular, the thread-injection algorithm, which we will discuss in a future post.
from
http://blogs.msdn.com/b/ericeil/archive/2009/04/23/clr-4-0-threadpool-improvements-part-1.aspx
I found working with this new 4 library really easy. This blog is showing the old BackgroundWorker way of doing things and the new Task way of doing things.
http://nitoprograms.blogspot.com/2010/06/reporting-progress-from-tasks.html