How to implement an explicit, inspectable Task queue? - c#

This question talks about how the traditional queue pattern is somewhat antiquated in modern C# due to the TPL: Best way in .NET to manage queue of tasks on a separate (single) thread
The accepted answer proposes what appears to be stateless solution. It is very elegant, but, and perhaps I'm a dinosaur or misunderstand the answer... What if I want to pause the queue or save its state? What if when enqueuing a task the behaviour should be dependent on the queue's state or if queued tasks can have different priorities?
How could one efficiently implement an ordered task queue - that actually has an explicit Queue object which you can inspect and even interact with, within the Task paradigm? Supporting single/parallel processing of enqueued tasks is a benefit but for my purposes, single-concurrency is acceptable if it raises problems. I am not dealing with millions of tasks a second, in fact my tasks are typically large/slow.
I am happy to accept there are solutions with different scalability depending on requirements, and that we can often trade-off between scalability and coding effort/complexity.

What you are describing sounds essentially like the Channel<T> API. This already exists:
nuget: https://www.nuget.org/packages/System.Threading.Channels/
msdn: https://devblogs.microsoft.com/dotnet/an-introduction-to-system-threading-channels/
additional: https://www.stevejgordon.co.uk/an-introduction-to-system-threading-channels
it isn't explicitly a Queue<T>, but it acts as a queue. There is support for bounded vs unbounded, and single vs multiple readers/writers.

Related

Optimum use of Concurrent Collections with Threads Vs. Tasks

I've been reading this article on MSDN about C# Concurrent Collections. It talks about the optimum threading to use for particular scenarios to get the most benefit out of the collections e.g:
ConcurrentQueue performs best when one dedicated thread is queuing and one dedicated thread is de-queuing. If you do not enforce this rule, then Queue might even perform slightly faster than ConcurrentQueue on computers that have multiple cores.
Is this advice still valid when one is using Tasks instead of raw Threads? From my (limited) understanding of C# Tasks, there is no guarantee that a particular Task will always run on the same thread between context switches, or does maintaining the stack frame mean that the same rules apply in terms of best usage?
Thanks.
One task always runs on the same thread. TPL is a user-mode library. User mode has no (practical) way of migrating executing code from thread to thread. Also there would be no point to doing that.
This advice applies exactly to tasks as it does to threads.
What that piece of advice means to say is that at the same time there should be one producer and one consumer only. You can have 100 threads enqueuing from time to time as long as they do not contend.
(I'm not questioning that advice here since that is out of scope for this question. But that is what's meant here.)

Multi-threaded queue consumer and task processing

I'm writing a service that has to read tasks from an AMQP message queue and perform a synchronous action based on the message type. These actions might be to send an email or hit a web service, but will generally be on the order of a couple hundred milliseconds assuming no errors.
I want this to be extensible so that other actions can be added in the future. Either way, the volume of messages could be quite high, with bursts of 100's / second coming in.
I'm playing around with several designs, but my questions are as follows:
What type of threading model should I go with? Do I:
a) Go with a single thread to consume from the queue and put tasks on a thread pool? If so, how do I represent those tasks?
b) Create multiple threads to host their own consumers and have them handle the task synchronously?
c) Create multiple threads to host their own consumers and have them all register a delegate to handle the tasks as they come in?
In the case of a or c, what's the best way to have the spawned thread communicate back with the main thread? I need to ack the message that came off the the queue. Do I raise an event from the spawned thread that the main thread listens to?
Is there a guideline as to how many threads I should run, given x cores? Is it x, 2*x? There are other services running on this system too.
You should generally* avoid direct thread programming in favor of the Task Parallel Library and concurrent collections built into .NET 4.0 and higher. Fortunately, the producer/consumer problem you described is common and Microsoft has a general-purpose tool for this: the BlockingCollection. This article has a good summary of its features. You may also refer to this white paper for performance analysis of the BlockingCollection<T> (among other things).
However, before pursuing the BlockingCollection<T> or an equivalent, given the scenario you described, why not go for the simple solution of using the Tasks. The TPL gives you the asynchronous execution of tasks with a lot of extras like cancellation and continuation. If, however, you need more advanced lifecycle management, then go for something like a BlockingCollection<T>.
* By "generally", I'm insinuating that the generic solution will not necessarily perform the best for your specific case as it's almost certain that a properly designed custom solution will be better. As with every decision, perform the cost/benefit analysis.

Is it fine to use tasks and thread-pool together?

After reading how the thread pool and tasks work in this article I came up with this question -
If I have a complex program in which some modules use tasks and some use thread pool, is it possible that there will be some scheduling problems due to the different uses?
Task are often implemented using the thread pool (one can of course also have tasks using other types of schedulers that give different behavior, but this is the default). In terms of the actual code being executed (assuming your tasks are representing delegates being run) there really isn't much difference.
Tasks are simply creating a wrapper around that thread pool call to provide additional functionality when it comes to gather information about, and processing the results of, that asynchronous operation. If you want to leverage that additional functionality then use tasks. If you have no need to use it in some particular context, there's nothing wrong with using the thread pool directly.
Mix the two, so long as you don't have trouble getting what you want out of the results of those operations, is not a problem at all.
No. And there actually isn't much in the way of memory or performance inefficiencies when mixing approaches; by default tasks use the same thread pool that thread pool threads use.
The only significant disadvantage of mixing both is lack of consistency in your codebase. If you were to pick one, I would use TPL since it is has a rich API for handling many aspects of multi-threading and takes advantage of async/await language features.
Since your usage is divided down module lines, you don't have much to worry about.
No, there wouldn't be problems - you just would be inefficient in doing both. use what is really needed and stick with the pattern. Remember to be sure that you make your app MT Safe also especially if you are accessing the same resources/variables etc... from different threads, regardless of which threading algorithm you use.
There shouldn't be any scheduling problems as such, but of course it's better to use Tasks and let the Framework decide what to do with the scheduled work. In the current version of the framework (4.5) the work will be queued through the ThreadPool unless the LongRunning option is used, but this behaviour may change in future of course.
Verdict: Mixing Tasks and ThreadPool isn't a problem, but for new applications it's recommended to use Tasks instead of queueing work items directly on the ThreadPool (one reason for that is ThreadPool isn't available in Windows 8 Runtime (Modern UI apps).

PLINQ vs Tasks vs Async vs Producer/Consumer queue? What to use?

I was reading C# 5.0 in nutshell and after reading author's view(s), I am quite confused as to what should I adopt. My requirement is that say I have a really long running (computationally heavy) task, say for example, calculate SHA1 (or some other) hash of millions of file, or really any other thing is is computationally heavy and is likely to take some time, what should be my approach toward developing it (in winforms if that matters, using VS 2012, C# 5.0), so that I can also report progress to the user.
Following scenario(s) come to mind...
Create a Task (with LongRunning option that computes the hashes and report the progress to user either by implementing IProgess<T> or Progess<T> or letting the task capture the SynchronizationContext context and posting to the UI.
Create a Async method like
async CalculateHashesAsync()
{
// await here for tasks the calculate the hash
await Task.Rung(() => CalculateHash();
// how do I report progress???
}
Use TPL (or PLINQ) as
void CalcuateHashes()
{
Parallel.For(0, allFiles.Count, file => calcHash(file)
// how do I report progress here?
}
Use a producer / consumer Queue.
Don't really know how?
The author in the book says...
Running one long running task on a pooled thread won't cause
trouble. It's when you run multiple long running tasks in parallel
(particularly ones that block) that performance can suffer. In that
case, there are usually better solutions than
TaskCreationOptions.LongRunnging
If tasks are IO bound, TaskCompletionSource and asynchronous functions let you
implement concurrency with callbacks instead of threads.
If tasks are compute bound, a producer/consumer queue lets you throttle the concurrency for those tasks, avoiding starvation for
other threads and process.
About the Producer/Consumer the author says...
A producer/consumer queue is a useful structure, both in parallel
programming and general concurrency scenarios as it gives you precise
control over how many worker threads execute at once, which is useful
not only in limiting CPU consumption, but other resources as well.
So, should I not use task, meaning that first option is out? Is second one the best option? Are there any other options? And If I were to follow author's advice, and implement a producer/consumer, how would I do that (I don't even have an idea of how to get started with producer/consumer in my scenario, if that is the best approach!)
I'd like to know if someone has ever come across such a scenario, how would they implement? If not, what would be the most performance effective and/or easy to develop/maintain (I know the word performance is subjective, but let's just consider the very general case that it works, and works well!)
really long running (computationally heavy) task, say for example, calculate SHA1 (or some other) hash of millions of file
That example clearly has both heavy CPU (hashing) and I/O (file) components. Perhaps this is a non-representative example, but in my experience even a secure hash is far faster than reading the data from disk.
If you just have CPU-bound work, the best solution is either Parallel or PLINQ. If you just have I/O-bound work, the best solution is to use async. If you have a more realistic and complex scenario (with both CPU and I/O work), then you should either hook up your CPU and I/O parts with producer/consumer queues or use a more complete solution such as TPL Dataflow.
TPL Dataflow works well with both parallel (MaxDegreeOfParallelism) and async, and has a builtin producer/consumer queue in-between each block.
One thing to keep in mind when mixing massive amounts of I/O and CPU usage is that different situations can cause massively different performance characteristics. To be safe, you'll want to throttle the data going through your queues so you won't end up with memory usage issues. TPL Dataflow has built-in support for throttling via BoundedCapacity.

Fastest way to asynchronously execute a method?

i´m currently dealing with a problem where i have to dispatch hell a lot of functions to another thread to prevent the current function from blocking.
now i wonder what the fastest way is to perform this task.
currently i´m stuck with
ThreadPool.UnsafeQueueUserWorkItem
as its slightly faster than the regular QueueUserWorkItem. however, i´m afraid that the threadpool may block this here. is there a faster way of dispatching a method call to another thread?
i just wonder what the best practice is for such a task? unsafe code would be no problem as it i´s in a scenario where already a lot of interop is used.
thanks
j.
CLR(4) team recommends:
Task is now the preferred way to queue work to the thread pool.
Read CLR 4.0 ThreadPool Improvements: Part 1 and New and Improved CLR 4 Thread Pool Engine for detail information. In short, reasons are: local queues, threads reusing, work stealing. Basiclly for load balancing goal.
Extras:
I don't understand why it's not the answer (downvoted).
You wrote
i have to dispatch hell a lot of functions to another thread to
prevent the current function from blocking"
I reply:
Some TPL (exists for net35) 'blocks' (concurrent collections, spinning primitives etc) are designed specifically for highly concurrent access, with the focus on minimizing or eliminating blocking for efficient management of work. You can use those blocks as well (for ex. - BlockingCollection for your problem). TPL designed for creating and handling hundreds (or even thousands) of cpu/io-bound operations (tasks) with minimal overhead (or millions with the help of PLinq).
You asked:
i just wonder what the best practice is for such a task?
I've already answered: best practice - TPL (reasoned, not just my recommendation)
Inserting multiple or bigger items at once should reduce the overhead.
Edited after reading one of your comments:
I have experienced similar things. My usual remedy is not to dispatch every asynchronous request immediately but rather mimic what Nagle's Algorithm does for TCP.
Here, upon receiving a Request() you would dispatch it immediately only if no asynchronous work is pending. If asynchronous work is pending you would dispatch only if a certain number of milliseconds since the earliest non-dispatched Request has elapsed or a certain number of outstanding Request()s has accumulated.
This is an effective pattern to cut down overhead when getting frequent Request()s over which you have no control. Hope that helps.
Maybe you could throw all your dispatch requests into a List<> and wake up another background thread to make the calls to QueueUserWorkItem.
Am I understanding the problem correctly?

Categories

Resources