Quartz vs "Reactive Extensions" - c#

I am looking for a scheduling library for C# and for a long time I though the "only" option is Quartz.NET which is quite robust and work just fine. But when I found "Reactive Extensions" (RX - http://msdn.microsoft.com/en-us/data/gg577609) I realized that it can do Time-Related operations as well and have native .NET frontend.
What are the limitations of Rx in terms of Time-Related operations? I need to fire tasks repeatedly in specific interval, after time period or so.
And are there any major differences? (in terms of performance etc. - for example by my experience Quartz freezes when there are more then 1500+- tasks scheduled)

The two are not really comparable. Yes, with both you can 'schedule' a task to occur in a specific timespan from now, but that is where the similarities end.
Quartz is a complete scheduling solution with a huge range of trigger options and persists tasks to file or database.
Reactive extensions are a great way to deal with streamed data or events and yes, there are options for throttling or delaying for periods of time.
If you're looking to schedule tasks, then Quartz is probably the right option. If your needing a sort of eventing framework with loads of options for buffering, delaying and joining, then Rx is possibly more appropriate.

Related

Sitecore Scheduled Tasks vs Windows Task Scheduler for data migration

We need to migrate data from an external database to Sitecore periodically (every 12 or 24 hours). I would like to know whether creating a Sitecore Scheduled Task is what one can opt for in such a scenario. I would also like to know the performance impact it can have on the website.
Since I can also build an external app for this activity that doesn't rely on the IIS worker process and schedule the same using a Windows Task scheduler, I would like to know the benefits and drawbacks of both approaches.
The obvious answer to this is "it depends" but that's a cop out answer that drives me crazy when people use it.
A scheduled task in Sitecore will have access to the full Sitecore API, so if you are doing data manipulation of Sitecore items, this can be really attractive. Jobs also run on background threads, so it does have a large impact on the website, but obviously if you exhaust all of threads the worker process is configured to use, that will be an issue, but a very small outside one at best.
The drawback to a Sitecore scheduled task is they cannot be scheduled at a finite time. IOW - "run this task at 3am every day" is not possible, tasks run on an interval basis. Might sound like a trivial difference, just schedule it to run every 24 hours, but in practice the interval inevitably drifts. This is the big advantage Windows Scheduled tasks have. If most of the work you need to do is not related to Sitecore, then this be a good approach. I've seen hybrid approaches where a Windows scheduled task triggers a call to Sitecore, which uses the jobs API to kick off a background task, but it never felt that elegant.

Grand Central Dispatch vs ThreadPool Performance in Xamarin iOS

I've reviewed the documentation for Xamarin, and it recommends using ThreadPool for multithreaded functionality, as can be seen here:
http://docs.xamarin.com/guides/ios/application_fundamentals/threading
However, a benchmark has been done showing that Grand Central Dispatch is much more performant than ThreadPool
http://joeengalan.wordpress.com/2012/02/26/execution-differences-between-grand-central-dispatch-and-threadpool-queueuserworkitem-in-monotouch/
Therefore my question is, why does Xamarin recommend ThreadPool over Grand Central Dispatch? Is Xamarin eventually going to tie ThreadPool into Grand Central Dispatch? When would one choose one over the other? Because if ThreadPool is going to be optimized by Xamarin, and eventually outperform Grand Central Dispatch, then I do not want to use Grand Central Dispatch.
There is very little "extra performance" that you can squeeze out of a machine, in particular a mobile device by introducing more threads.
Like my comment on that post that you linked says (from February 2012) as well as the first paragraph on the article you linked explains the reason.
The difference between GCD and ThreadPool is that the ThreadPool in Mono has a "slow start" setup, so that it does not create more threads than necessary in the presence of work peaks. You can easily starve the CPU by launching too many threads, so the threadpool throttles itself after the initial threads have been created and then tries to only create a new thread every second (give or take, I dont remember the actual details).
If you want to force the ThreadPool to actually spin up a lot of threads, you can control that with the ThreadPool.SetMinThreads.
The reason to use the ThreadPool is that the same code will work across all platforms.
Notice that the document talks about using the ThreadPool over the other standard .NET threading APIs and does not say anything about using GCD or not. It is merely that the threadpool is a better choice than rolling your own management using Threads.
That said, API wise, these days I recommend people to use the Task Parallel Library (TPL) which is a much higher level way of thinking about your background operations than a thread. In addition, you get same API across platform with the flexibility of using either the built-in threadpool, or dispatching to GCD, by switching one line of code.
The current problem with mono (xamarin) thread pool is that it does not perform.
The Xamarin debugger gets chocked with as little as 10 simultaneous tasks. In release it is not much better.
In my case the same code on windows is outperforming on Mac 10x or more (note I am not using ANYTHING system specific). I tried different combinations of thread pool, async methods and asynchronous callbacks (BeginRead etc.) - Xamarin sacks at all of it.
If I have to guess it relates to their obsession with IOS being inherently single threaded. As for them recommending it , I have a guess too - that is the only part of framework that works as far as multithreading is concerned.
I spent weeks trying to optimize my code , but there is nothing you can do , if your use multithreading you are stack.

Quartz.net + Task Parallel Library

I'm working on upgrading a job scheduling system we use in-house that uses Quartz.net. Looking at the source of the latest version of Quartz, I noticed that it still uses its own thread pool implementation, as opposed the much-improved thread pool (or anything from System.Threading.Tasks) that started shipping with .NET 4.0.
I'd be curious to know if anyone has successfully implemented a job scheduling system that uses Quartz.net for its scheduling features and TPL for thread pooling. Is it relatively easy to swap out Quartz's thread pool for that of TPL? Is Quartz even still relevant in the world of Tasks? Alternatively, as sold as I am on the great improvements with the .NET 4.x thread pool (core awareness, local queues, improved locking, etc.), is Quartz's thread pool good enough for typical coarse-grained background jobs and not worth the effort of forcing TPL into the mix?
Thanks in advance for any insights on using (or not using) these two tools together.
Quartz.NET is there to solve a bit different problem than TPL. Quartz.NET is intended for recurring job scheduling with rich set of capabilities for execution timing. TPL on the other hand is meant for highly performant parallel execution of computational workload.
So in essence you (usually) use Quartz.NET for precision scheduling and TPL for conccurent workloads that needs to be completed as quick as possible utilizing all computing resources (cores etc).
Having said this, I'd say the thread pool implementation that Quartz.NET uses is quite sufficient for the job. Also bear in mind that Quartz.NET is .NET 3.5 compliant and cannot use 4.0 only features.
Of course, you can also always combine the two in your solution.

Task parallel library - Parallelism on single core

I am working on a WPF application.
In a screen/View i have to make 6 calls to a WCF service. None of those calls are related in the sense they dont share data neither are they dependent on each other. I am planning to use TPL and make these 6 WCF service calls as 6 tasks. Now the application might be either deployed on a single core machine or multiple core machine.
I am being told that usage of TPL on single core machine would actually increase the time take for the tasks to complete because of the overhead that would be placed on the cpu scheduler to time splice different tasks. Is this true. If yes should i still continue with my design or should i look at alternatives.
if i have to look at alternatives, what are those alternatives :) ?
When doing something CPU intensive, you would be adding overhead by running parallel threads on a single core machine.
In your case the tasks are not CPU intensive, they are waiting for a service call to respond, so you can very well run parallel threads on a single core machine.
Depending on how the server handles the calls, there might not be any time increase anyway. If the calls are queued on the server, it will take about the same time to run all calls anyway. In that case it would be better to run the calls in sequence, just because it's simpler.
Your best bet is to profile using multi-core and single core. Most bios's can set the number of active core's so it shouldn't be a big problem. You can do some mock testing to find out if it will work for you.
Obviously using task switching has overhead issues but as long as each task's time is much longer than the setup time you won't notice it.
There are many ways to implement multi-tasking behavior and if you do not know which is best then chances are you need to actually write some test cases and do some profiling. This is not difficult to do. If you are simply trying to use multi-core systems then it generally is quite easy with the latest version of .NET and you can even set it up for multi-core but revert back to single core by using appropriate constructs.
the async/await pattern, for example, can easily be ran synchronously by either using #ifdef or removing all await keywords(with a search and replace tool). Parallel.For loops are easily convertible to normal for loops either directly or by changing MaxDegreeOfParallelism. Tasks can easily be ran synchronously.
If you would like to make it more transparent you could use some pre-processing scripting like T4.
In general, When running multi threads on single core it will be slower since it has Context Switch between the threads.
I think the following diagram will explain you the difference:
As you can see the diagram refer to 4 threads running on single core, first time in multi-tasking and the second time Sequential.
you can see that in multi-tasking all threads will finish at a later time than Sequential tasking.
In your specific case in probably won't be the same and I think #Guffa is right in his answer since its involving WCF calling

Appropriate Multi-Threading Option

Scenario
I have a very heavy number-crunching process that pools large datasets from 3 different databases and then does a bit of processing on each to eventually produce a result.
This process is fine if it is only used by a single asset. However I now have 3500 assets that I need to process, which takes about 1hr30mins in the state of the current process.
Question
What is my best option for speeding this process up in terms of a multi-threaded c# application? Realistically I don't have to share anything between the processing of each asset, so I'm confident that being able to run process multiple assets at a time shouldn't cause too many issues.
Thoughts
I've heard good things about thread pools, but I guess realistically I want something that isn't too huge to implement, is easily understandable and can run off a decent number of threads at a time.
Help would be greatly appreciated.
In .net you can use the existing Thread Pool, no need to implement one yourself. Here is the relevant MSDN.
You should take care not to run too many processes at once (3500 are a bit much), but using the supplied queuing mechanism should get you started in the right direction.
Another thing to try is using PLINQ.
If you don't have a multi-core processor, multiple machines, and/or the thread processes are not I/O bound, multithreading will not help. Start by profiling the current processing to see where the time is going.
Thread pools are fine, and you can use a task queue to do simple load-balancing, but if there's no spare CPU cycles in the current application this would be a waste of time.
The nicest option would be to use the new Task Parallel Library in .NET 4, if you can do this using VS 2010 RC. This has built-in load balancing and work stealing queues, so it will make this task easy to thread, and very scalable.
However, if you need to do this in .NET 3.5, I would recommend using the ThreadPool, and just using ThreadPool.QueueUserWorkItem to start each task.
If your tasks are all very computationally intensive for their entire lifetime, you may want to prevent having too many running concurrently. Some form of queue, which you pull work from and execute, can be beneficial in this case. Just place all of your work items into a queue, and have threads pull work from the queue (with appropriate locking), and process.
If you have a multi-core system, and CPU cycles are your bottleneck, this should scale very well.
The .Net built in ThreadPool will solve both of your requirements of running a decent number of threads as well as being simple to work with. I have previously written an article on the subject which you can find here.
With using SQL Server 2005 or later, you can create user-defined functions in C# and use them from within T-SQL procedures, which can give a marked speedup for number crunching. SQL Server is multi-threaded and does a good job with it, so consider keeping as much of the processing in the database engine as you can.

Categories

Resources