I'm working on upgrading a job scheduling system we use in-house that uses Quartz.net. Looking at the source of the latest version of Quartz, I noticed that it still uses its own thread pool implementation, as opposed the much-improved thread pool (or anything from System.Threading.Tasks) that started shipping with .NET 4.0.
I'd be curious to know if anyone has successfully implemented a job scheduling system that uses Quartz.net for its scheduling features and TPL for thread pooling. Is it relatively easy to swap out Quartz's thread pool for that of TPL? Is Quartz even still relevant in the world of Tasks? Alternatively, as sold as I am on the great improvements with the .NET 4.x thread pool (core awareness, local queues, improved locking, etc.), is Quartz's thread pool good enough for typical coarse-grained background jobs and not worth the effort of forcing TPL into the mix?
Thanks in advance for any insights on using (or not using) these two tools together.
Quartz.NET is there to solve a bit different problem than TPL. Quartz.NET is intended for recurring job scheduling with rich set of capabilities for execution timing. TPL on the other hand is meant for highly performant parallel execution of computational workload.
So in essence you (usually) use Quartz.NET for precision scheduling and TPL for conccurent workloads that needs to be completed as quick as possible utilizing all computing resources (cores etc).
Having said this, I'd say the thread pool implementation that Quartz.NET uses is quite sufficient for the job. Also bear in mind that Quartz.NET is .NET 3.5 compliant and cannot use 4.0 only features.
Of course, you can also always combine the two in your solution.
Related
After reading how the thread pool and tasks work in this article I came up with this question -
If I have a complex program in which some modules use tasks and some use thread pool, is it possible that there will be some scheduling problems due to the different uses?
Task are often implemented using the thread pool (one can of course also have tasks using other types of schedulers that give different behavior, but this is the default). In terms of the actual code being executed (assuming your tasks are representing delegates being run) there really isn't much difference.
Tasks are simply creating a wrapper around that thread pool call to provide additional functionality when it comes to gather information about, and processing the results of, that asynchronous operation. If you want to leverage that additional functionality then use tasks. If you have no need to use it in some particular context, there's nothing wrong with using the thread pool directly.
Mix the two, so long as you don't have trouble getting what you want out of the results of those operations, is not a problem at all.
No. And there actually isn't much in the way of memory or performance inefficiencies when mixing approaches; by default tasks use the same thread pool that thread pool threads use.
The only significant disadvantage of mixing both is lack of consistency in your codebase. If you were to pick one, I would use TPL since it is has a rich API for handling many aspects of multi-threading and takes advantage of async/await language features.
Since your usage is divided down module lines, you don't have much to worry about.
No, there wouldn't be problems - you just would be inefficient in doing both. use what is really needed and stick with the pattern. Remember to be sure that you make your app MT Safe also especially if you are accessing the same resources/variables etc... from different threads, regardless of which threading algorithm you use.
There shouldn't be any scheduling problems as such, but of course it's better to use Tasks and let the Framework decide what to do with the scheduled work. In the current version of the framework (4.5) the work will be queued through the ThreadPool unless the LongRunning option is used, but this behaviour may change in future of course.
Verdict: Mixing Tasks and ThreadPool isn't a problem, but for new applications it's recommended to use Tasks instead of queueing work items directly on the ThreadPool (one reason for that is ThreadPool isn't available in Windows 8 Runtime (Modern UI apps).
I'm currently working on an application that relies on many different web services to get data. Since I want to modularize each service and have a bit of dependency in there (service1 must run before service 2 and 3 etc), I'm running each service in its own task.
The tasks themselves are either
running actively, meaning they're sending their request to the web service and are waiting for a response or processing the response
waiting (via monitor and timeout) - once a task finishes all waiting tasks wake up and check if their dependencies have finished
Now, the system is running with what I would call good performance (especially since the performance is rather negligible) - however, the application generates quite a number of tasks.
So, to my question: are ~200 tasks in this scenario too many? Do they generate that much overhead so that a basically non-threaded approach would be better?
The general answer is "Measure, Measure, Measure" :) if you're not experiencing any problems with performance, you shouldn't start optimizing.
I'd say 200 tasks are fine though. The beauty of tasks compared to threads is their low overhead compared to "real" threads and even the thread pool. The TaskScheduler is making sure all the hardware threads are utilized as much as possible with the least amount of thread switching. it does this by various tricks such as running child tasks serially, stealing work from queues on other threads and so on.
You can also give the TaskScheduler some hints about what a specific task is going to do via the TaskCreationOptions
If you want some numbers, check out this post, as you can see, Tpl is pretty cheap in terms of overhead:
.NET 4.0 - Performance of Task Parallel Library (TPL), by Daniel Palme
This is another interesting article on the subject:
CLR Inside Out: Using concurrency for scalability, by Joe Duffy
I've reviewed the documentation for Xamarin, and it recommends using ThreadPool for multithreaded functionality, as can be seen here:
http://docs.xamarin.com/guides/ios/application_fundamentals/threading
However, a benchmark has been done showing that Grand Central Dispatch is much more performant than ThreadPool
http://joeengalan.wordpress.com/2012/02/26/execution-differences-between-grand-central-dispatch-and-threadpool-queueuserworkitem-in-monotouch/
Therefore my question is, why does Xamarin recommend ThreadPool over Grand Central Dispatch? Is Xamarin eventually going to tie ThreadPool into Grand Central Dispatch? When would one choose one over the other? Because if ThreadPool is going to be optimized by Xamarin, and eventually outperform Grand Central Dispatch, then I do not want to use Grand Central Dispatch.
There is very little "extra performance" that you can squeeze out of a machine, in particular a mobile device by introducing more threads.
Like my comment on that post that you linked says (from February 2012) as well as the first paragraph on the article you linked explains the reason.
The difference between GCD and ThreadPool is that the ThreadPool in Mono has a "slow start" setup, so that it does not create more threads than necessary in the presence of work peaks. You can easily starve the CPU by launching too many threads, so the threadpool throttles itself after the initial threads have been created and then tries to only create a new thread every second (give or take, I dont remember the actual details).
If you want to force the ThreadPool to actually spin up a lot of threads, you can control that with the ThreadPool.SetMinThreads.
The reason to use the ThreadPool is that the same code will work across all platforms.
Notice that the document talks about using the ThreadPool over the other standard .NET threading APIs and does not say anything about using GCD or not. It is merely that the threadpool is a better choice than rolling your own management using Threads.
That said, API wise, these days I recommend people to use the Task Parallel Library (TPL) which is a much higher level way of thinking about your background operations than a thread. In addition, you get same API across platform with the flexibility of using either the built-in threadpool, or dispatching to GCD, by switching one line of code.
The current problem with mono (xamarin) thread pool is that it does not perform.
The Xamarin debugger gets chocked with as little as 10 simultaneous tasks. In release it is not much better.
In my case the same code on windows is outperforming on Mac 10x or more (note I am not using ANYTHING system specific). I tried different combinations of thread pool, async methods and asynchronous callbacks (BeginRead etc.) - Xamarin sacks at all of it.
If I have to guess it relates to their obsession with IOS being inherently single threaded. As for them recommending it , I have a guess too - that is the only part of framework that works as far as multithreading is concerned.
I spent weeks trying to optimize my code , but there is nothing you can do , if your use multithreading you are stack.
I am looking for a scheduling library for C# and for a long time I though the "only" option is Quartz.NET which is quite robust and work just fine. But when I found "Reactive Extensions" (RX - http://msdn.microsoft.com/en-us/data/gg577609) I realized that it can do Time-Related operations as well and have native .NET frontend.
What are the limitations of Rx in terms of Time-Related operations? I need to fire tasks repeatedly in specific interval, after time period or so.
And are there any major differences? (in terms of performance etc. - for example by my experience Quartz freezes when there are more then 1500+- tasks scheduled)
The two are not really comparable. Yes, with both you can 'schedule' a task to occur in a specific timespan from now, but that is where the similarities end.
Quartz is a complete scheduling solution with a huge range of trigger options and persists tasks to file or database.
Reactive extensions are a great way to deal with streamed data or events and yes, there are options for throttling or delaying for periods of time.
If you're looking to schedule tasks, then Quartz is probably the right option. If your needing a sort of eventing framework with loads of options for buffering, delaying and joining, then Rx is possibly more appropriate.
I am new to .Net platform. I did a search and found that there are several ways to do parallel computing in .Net:
Parallel task in Task Parallel Library, which is .Net 3.5.
PLINQ, .Net 4.0
Asynchounous Programming, .Net 2.0, (async is mainly used to do I/O heavy tasks, F# has a concise syntax supporting this). I list this because in Mono, there seem to be no TPL or PLINQ. Thus if I need to write cross platform parallel programs, I can use async.
.Net threads. No version limitation.
Could you give some short comments on these or add more methods in this list?
You do need to do a fair amount of research in order to determine how to effectively multithread. There are some good technical articles, part of the Microsoft Parallel Computing team's site.
Off the top of my head, there are several ways to go about multithreading:
Thread class.
ThreadPool, which also has support for I/O-bound operations and an I/O completion port.
Begin*/End* asynchronous operations.
Event-based asynchronous programming (or "EBAP") components, which use SynchronizationContext.
BackgroundWorker, which is an EBAP that defines an asynchronous operation.
Task class (Task Parallel Library) in .NET 4.
Parallel LINQ. There is a good article on Parallel.ForEach (Task Parallel Library) vs. PLINQ.
Rx or "LINQ to Events", which does not yet have a non-Beta version but is nearing completion and looks promising.
(F# only) Asynchronous workflows.
Update: There is an article Understanding and Applying Parallel Patterns with the .NET Framework 4 available for download that gives some direction on which solutions to use for which kinds of parallel scenarios (though it assumes .NET 4 and doesn't cover Rx).
Strictly speaking, the distinction between parallel, asynchronous and concurrent should be made here.
Parallel means that a "task" is split among several smaller sub-"tasks" that can be run at the same time. This requires a multi-core CPU or a multi-CPU computer, where each task has its dedicated core or CPU. Or multiple computers. PLINQ (data parallelism) and TPL (task parallelism) fall into this category.
Asynchronous means that tasks run without blocking each other. F#'s async expression, Rx, Begin/End pattern are all APIs for async programming.
Concurrency is a concept more broad than parallelization and asynchrony.
Concurrency means that several "tasks" run at the same time, interacting with each other. But these "tasks" don't have to run on separate physical computing units, as is meant in parallelization. For example, multitasking operating systems can execute multiple processes concurrently even on single-core single-CPU computers, using time slices.
Concurrency can be achieved for example with the Actor model and message passing (e.g. F#'s mailbox, Erlang processes (Retlang in .Net))
Threads are a relatively low-level concept compared to the concepts above. Threads are tasks running within a process, running concurrently and managed directly by the operating system's scheduler. You can implement parallelization when the operating system maps each thread to a separate core, or an Actor model by implementing message queuing, routing, etc on each thread.
There are also some .NET libraries for data parallel programming which target the Graphics Processing Unit (GPU) including:
Microsoft Accelerator
is for data parallel programming and can target either the GPU or multi-core processors.
Brama is for LINQ style data transformations that run on the GPU.
CUDA.NET provides a wrapper to allow to CUDA to be used from .NET programs.
There is also the Reactive Extensions for .NET (Rx)
Rx is basically linq queries for events. It allows you to process and combine asynchronous data streams in the same way linq allows you to work with collections. So you would probably use it in conjunction with other parallel technologies as a way of bringing the results of your parallel operations together without having to worry about locks and other low level threading primitives.
Expert to Expert: Brian Beckman and Erik Meijer - Inside the .NET Reactive Framework (Rx) gives a good overview of what Rx is all about.
EDIT: Another library worth mention is the Concurrency and Coordination Runtime (CCR), it's been around for a long time (earlier than '06) and is shipped as part of the Microsoft Robotics Studio.
Rx has a lot of the same cool ideas that the CCR has inside it, but with a much nicer API in my opinion. There's still some interesting stuff in the CCR though so it might be worth checking out. There's also a distributed services framework that works with the CCR that might make it useful depending on what you're doing.
Expert to Expert: Meijer and Chrysanthakopoulos - Concurrency, Coordination and the CCR
One more is the new Task Parallel library in .NET 4.0, which is similar and along the lines of what you've already discovered, but this may be an interesting read:
Task Parallel Library
two major ways to do parallel are threads and the new task based library TPL.
Asynchronous Programming you mention is nothing more then one new thread in the threadpool.
PLINQ, Rx and others mentioned are actually extensions sitting on the top of the new task scheduler.
the best article explaining exactly the new architecture for new task scheduler and all libraries on the top of it, Visual Studio 2010 and new TPL .NET 4.0 Task-based Parallelism is here (by Steve Teixeira, Product Unit Manager for Parallel Developer Tools at Microsoft):
http://www.drdobbs.com/visualstudio/224400670
otherwise Dr Dobbs has dedicated parallel programming section here: http://www.drdobbs.com/go-parallel/index.jhtml
The main difference between say threads and new task based parallel programming is you do not need to think anymore in terms of threads, how do you manage pools and underlying OS and hardware anymore. TPL takes care for that you just use tasks. That is a huge change in the way you do parallel on any level including abstraction.
So in .NET actually you do not have many choices:
Thread
New task based, task scheduler.
Obviously the task based is the way to go.
cheers
Valko