Multi-threaded queue consumer and task processing

Multi-threaded queue consumer and task processing - c#

I'm writing a service that has to read tasks from an AMQP message queue and perform a synchronous action based on the message type. These actions might be to send an email or hit a web service, but will generally be on the order of a couple hundred milliseconds assuming no errors.
I want this to be extensible so that other actions can be added in the future. Either way, the volume of messages could be quite high, with bursts of 100's / second coming in.
I'm playing around with several designs, but my questions are as follows:
What type of threading model should I go with? Do I:
a) Go with a single thread to consume from the queue and put tasks on a thread pool? If so, how do I represent those tasks?
b) Create multiple threads to host their own consumers and have them handle the task synchronously?
c) Create multiple threads to host their own consumers and have them all register a delegate to handle the tasks as they come in?
In the case of a or c, what's the best way to have the spawned thread communicate back with the main thread? I need to ack the message that came off the the queue. Do I raise an event from the spawned thread that the main thread listens to?
Is there a guideline as to how many threads I should run, given x cores? Is it x, 2*x? There are other services running on this system too.

You should generally* avoid direct thread programming in favor of the Task Parallel Library and concurrent collections built into .NET 4.0 and higher. Fortunately, the producer/consumer problem you described is common and Microsoft has a general-purpose tool for this: the BlockingCollection. This article has a good summary of its features. You may also refer to this white paper for performance analysis of the BlockingCollection<T> (among other things).
However, before pursuing the BlockingCollection<T> or an equivalent, given the scenario you described, why not go for the simple solution of using the Tasks. The TPL gives you the asynchronous execution of tasks with a lot of extras like cancellation and continuation. If, however, you need more advanced lifecycle management, then go for something like a BlockingCollection<T>.
* By "generally", I'm insinuating that the generic solution will not necessarily perform the best for your specific case as it's almost certain that a properly designed custom solution will be better. As with every decision, perform the cost/benefit analysis.

Related

Multithreaded approach to process SQS item Queue

In this scenerio, I have to Poll AWS SQS messages from a queue, each async request can fetch upto 10 sqs items/messages. Once I Poll the items, Then I have to process those items on a kubernetes pod. Item processing includes getting response from few API calls, it may take some time & then saving the item to DB & S3.
I did some R&D & reach on following conclusion
To use consumer producer model, 1 thread will poll items & another thread will process the item or to use multi-threading for item processing
Maintain a data structure that will containes sqs polled items ready for processing, DS could be Blocking collection or Concurrent queue
Using Task Parellel Library for threadpooling & in item processing.
Channels can be used
My Queries
What would be best approach to achieve best performance or increase TPS.
Can/Should I use data flow TPL
Multi threaded or single threaded with asyn tasks

This is very dependant on the specifics of your use-case and how much effort would you want to put in.
I will, however, explain the thought process I would use when making such a decision.
The naive solution to handle SQS messages would be to do it one at a time sequentially (i.e. without concurrency). It doesn't mean that you're limited to a single message at a time since you can add more pods to the cluster.
So even in that naive solution you have one concurrency point you can utilize but it has a lot of overhead. The way to reduce overhead is usually to utilize the same overhead but process more messages with it. That's why, for example, SQS allows you to get 1-10 messages in a single call and not just one. It spreads the call overhead over 10 messages. In the naive solution the overhead is the cost of starting a whole process. Using the process for more messages means concurrent processing.
I've found that for stable and flexible concurrency you want many points of concurrency, but have each of them capped at some configurable degree of parallelism (whether hardcoded or actual configuration). That way you can tweak each of them to achieve optimal output (increase when you have free CPU and memory and decrease otherwise).
So, where can the additional concurrency be introduced? This is a progression where each step utilizes resources better but requires more effort.
Fetch 10 messages instead of one for every SQS API call and process them concurrently. That way you have 2 points of concurrency you can control: Number of pods, number of messages (up to 10) concurrently.
Have a few tasks each fetching 1-10 tasks and processing them concurrently. That's 3 concurrency points: Pods, tasks and messages per task. Both these solutions suffer from messages with varying processing time, meaning that a single long running message will "hold up" all the other 1-9 "slots" of work effectively reducing the concurrency to lower than configured.
Set up a TPL Dataflow block to process the messages concurrently and a task (or few) continuously fetching messages and pumping into the block. Keep in mind that SQS messages need to be explicitly deleted so the block needs to receive the message handle too so the message can be deleted after processing.
TPL Dataflow "pipe" consisting of a few blocks where each has it's own concurrency degree. That's useful when you have different steps of processing of the message where each step has different limitations (e.g. different APIs with different throttling configurations).
I personally am very fond of, and comfortable with, the Dataflow library so I would go straight to it. But simpler solutions are also valid when performance is less of an issue.

I'm not familiar with Kubernetes but there are many things to consider when maximising throughput.
All the things which you have mentioned is IO bound not CPU bound. So, using TPL is overcomplicating the design for marginal benefit. See: https://learn.microsoft.com/en-us/dotnet/csharp/async#recognize-cpu-bound-and-io-bound-work
Your Kubernetes pods are likely to have network limitations. For example, with Azure Function Apps on Consumption Plans is limited to 1,200 outbound connections. Other services will have some defined limits, too. https://learn.microsoft.com/en-us/azure/azure-functions/manage-connections?tabs=csharp#connection-limit. Due to the nature of your work, it is likely that you will reach these limits before you need to process IO work on multiple threads.
You may also need to consider limits of the services which you are dependent on and ensure they are able to handle the throughput.
You may want to consider using Semaphores to limit the number of active connections to satisfy both your infrastructure and external dependency limits https://learn.microsoft.com/en-us/dotnet/api/system.threading.semaphoreslim?view=net-5.0
That being said, 500 messages per second is a realistic amount. To improve it further, you can look at having multiple processes with independent resource limitations processing the queue.

Not familiar with your use case, or specifically with the tech you are using, but this sounds like a very common message handling scenario.
Few guidelines:
First, these are guidelines, your usecase might be very different then what the ones commenting here are used to.
Whenever you want to increase your throughput you need to identify
your bottlenecks, and thrive towards CPU bottleneck, making sure you
fully utilize it. CPU load is usually the most expensive, and
generally makes for a more reliable metric for autoscaling. Obviously, depending on your remote api calls and your DB you might reach other bottlenecks - SQS queue size also makes for a good autoscaling metric, but keep in mind that autoscalling isn't guaranteed to increase you throughput if your bottleneck is DB or API related.
I would not go for a fancy solution with complex data structures, again, not familiar with your usecase, so I might be wrong - but keep it simple. There should be one thread that is responsible for polling the queue, and when it finds new messages it should create a Task that processes a batch. There should generally be one Task per processing batch - let the ThreadPool handle the number of threads.
Not familiar with .net SQS library. However, I am familiar with other libraries for very similar solutions. Most Libraries for queues out there already do it all for you, and you don't really have to worry about it. You should probably just have a callback function that is called when the highly optimized library already finds new messages. Those libraries probably already create a new task for each of those batches - you just need to register to their callback, and make sure you await any I/O bound code.
Edit: The solution I am proposing does have a limitation in that a single message can block an entire batch, this is not necessarily a bad thing - if your solution requires different processing for different messages, and you don't want to create this inner batch dependency, a TPL DataFlow could definitely be a good solution for your usecase.

Yeah, this sounds very much like the task for TPL Dataflow, it is very versatile yet powerful instrument. Your first chain link would acquire messages from the queue (not neccessarily one-threaded-ly, you just pass some delegates in). You will also be in control of how many items are "queued" locally this way.
Then you "subscribe" your workers in any way you desire – you can even customize it so that "faulted" processings would be put back into your queue — and it woudn't even matter if your processing is IO bound or not. If it is — well, nice, TPL dataflow is asyncronous, if not — well, not a problem, TPL dataflow can also be syncronous. Or you can fire up some thread pool threads, no biggie.

How to implement an explicit, inspectable Task queue?

This question talks about how the traditional queue pattern is somewhat antiquated in modern C# due to the TPL: Best way in .NET to manage queue of tasks on a separate (single) thread
The accepted answer proposes what appears to be stateless solution. It is very elegant, but, and perhaps I'm a dinosaur or misunderstand the answer... What if I want to pause the queue or save its state? What if when enqueuing a task the behaviour should be dependent on the queue's state or if queued tasks can have different priorities?
How could one efficiently implement an ordered task queue - that actually has an explicit Queue object which you can inspect and even interact with, within the Task paradigm? Supporting single/parallel processing of enqueued tasks is a benefit but for my purposes, single-concurrency is acceptable if it raises problems. I am not dealing with millions of tasks a second, in fact my tasks are typically large/slow.
I am happy to accept there are solutions with different scalability depending on requirements, and that we can often trade-off between scalability and coding effort/complexity.

What you are describing sounds essentially like the Channel<T> API. This already exists:
nuget: https://www.nuget.org/packages/System.Threading.Channels/
msdn: https://devblogs.microsoft.com/dotnet/an-introduction-to-system-threading-channels/
additional: https://www.stevejgordon.co.uk/an-introduction-to-system-threading-channels
it isn't explicitly a Queue<T>, but it acts as a queue. There is support for bounded vs unbounded, and single vs multiple readers/writers.

Threading or Task

I'm currently picking up C# again and developping a simple application that sends broadcast messages and when received shown on a Windows Form.
I have a discovery class with two threads, one that broadcasts every 30 seconds, the other thread listens on a socket. It is in a thread because of the blocking call:
if (listenSocket.Poll(-1, SelectMode.SelectRead))
The first thread works much like a timer in a class library, it broadcasts the packet and then sleeps for 30 seconds.
Now in principle it works fine, when a packet is received I throw it to an event and the Winform places it in a list. The problems start with the form though because of the main UI thread requiring Invoke. Now I only have two threads and to me it doesn't seem to be the most effective in the long run becoming a complex thin when the number of threads will grow.
I have explored the Tasks but these seem to be more orientated at a once off long running task (much like the background worker for a form).
Most threading examples I find all report to the console and do not have the problems of Invoke and locking of variables.
As i'm using .NET 4.5 should I move to Tasks or stick to the threads?

Async programming will still delegate some aspects of your application to a different thread (threadpool) if you try to update the GUI from such a thread you are going to have similar problems as you have today with regular threads.
However there are many techniques in async await that allow you to delegate to a background thread, and yet put a kind off wait point saying please continue here on the GUI thread when you are finished with that operation which effectively allows you to update the GUI thread without invoke, and have a responsive GUI. I am talking about configureAwait. But there are other techniques as well.
If you don't know async await mechanism yet, this will take you some investment of your time to learn all these new things. But you'll find it very rewarding.
But it is up to you to decide if you are willing to spend a few days learning and experimenting with a technology that is new to you.
Google around a bit on async await, there are some excellent articles from Stephen Cleary for instance http://blog.stephencleary.com/2012/02/async-and-await.html

Firstly if you're worried about scalability you should probably start off with an approach that scales easily. ThreadPool would work nice. Tasks are based on ThreadPool as well and they allow for a bit more complex situations like tasks/threads firing in a sequence (also based on condition), synchronization etc. In your case (server and client) this seems unneeded.
Secondly it looks to me that you are worried about a bottleneck scenario where more than one thread will try to access a common resource like UI or DB etc. With DBs - don't worry they can handle multiple access well. But in case of UI or other not-multithread-friendly resource you have to manage parallel access yourself. I would suggest something like BlockingCollection which is a nice way to implement "many producers, one consumer" pattern. This way you could have multiple threads adding stuff and just one thread reading it from the collection and passing it on the the single-threaded resource like UI.
BTW, Tasks can also be long running i.e. run loops. Check this documentation.

How to marshal calls to specific threads in C# using TPL

I have a situation where I have a polling thread for a TCPClient (is that the best plan for a discrete TCP device?) which aggregates messages and occasionally responds to those messages by firing off events. The event producer really doesn't care much if the thread is blocked for a long time, but the consumer's design is such that I'd prefer to have it invoke the handlers on a single worker thread that I've got for handling a state machine.
The question then is this. How should I best manage the creation, configuration (thread name, is background, etc.) lifetime, and marshaling of calls for these threads using the Task library? I'm somewhat familiar with doing this explicitly using the Thread type, but when at all possible my company prefers to do what we can just through the use of Task.
Edit: I believe what I need here will be based around a SynchronizationContext on the consumer's type that ensures that tasks are schedules on a single thread tied to that context.

The question then is this. How should I best manage the creation, configuration (thread name, is background, etc.) lifetime, and marshaling of calls for these threads using the Task library?
This sounds like a perfect use case for BlockingCollection<T>. This class is designed specifically for producer/consumer scenarios, and allows you to have any threads add to the collection (which acts like a thread safe queue), and one (or more) thread or task call blockingCollection.GetConsumingEnumerable() to "consume" the items.

You could consider using TPL DataFlow where you setup an ActionBlock<T> that you push messages into from your TCP thread and then TPL DataFlow will take care of the rest by scaling out the processing of the actions as much your hardware can handle. You can also control exactly how much processing of the actions happen by configuring the ActionBlock<T> with a MaxDegreeOfParallelism.
Since processing sometimes can't keep up with the flow of incoming data, you might want to consider "linking" a BufferBlock<T> in front of the ActionBlock<T> to ensure that the TCP processing thread doesn't get too far ahead of what you can actually process. This would have the same effect as using BlockingCollection<T> with a bounded capacity.
Finally, note that I'm linking to .NET 4.5 documentation because it's easiest, but TPL DataFlow is available for .NET 4.0 via a separate download. Unfortunately they never made a NuGet package out of it.

Alternative to Threads

I've read that threads are very problematic. What alternatives are available? Something that handles blocking and stuff automatically?
A lot of people recommend the background worker, but I've no idea why.
Anyone care to explain "easy" alternatives? The user will be able to select the number of threads to use (depending on their speed needs and computer power).
Any ideas?

To summarize the problems with threads:
if threads share memory, you can get
race conditions
if you avoid races by liberally using locks, you
can get deadlocks (see the dining philosophers problem)
An example of a race: suppose two threads share access to some memory where a number is stored. Thread 1 reads from the memory address and stores it in a CPU register. Thread 2 does the same. Now thread 1 increments the number and writes it back to memory. Thread 2 then does the same. End result: the number was only incremented by 1, while both threads tried to increment it. The outcome of such interactions depend on timing. Worse, your code may seem to work bug-free but once in a blue moon the timing is wrong and bad things happen.
To avoid these problems, the answer is simple: avoid sharing writable memory. Instead, use message passing to communicate between threads. An extreme example is to put the threads in separate processes and communicate via TCP/IP connections or named pipes.
Another approach is to share only read-only data structures, which is why functional programming languages can work so well with multiple threads.

This is a bit higher-level answer, but it may be useful if you want to consider other alternatives to threads. Anyway, most of the answers discussed solutions based on threads (or thread pools) or maybe tasks from .NET 4.0, but there is one more alternative, which is called message-passing. This has been successfuly used in Erlang (a functional language used by Ericsson). Since functional programming is becoming more mainstream in these days (e.g. F#), I thought I could mention it. In genral:
Threads (or thread pools) can usually used when you have some relatively long-running computation. When it needs to share state with other threads, it gets tricky (you have to correctly use locks or other synchronization primitives).
Tasks (available in TPL in .NET 4.0) are very lightweight - you can split your program into thousands of tasks and then let the runtime run them (it will use optimal number of threads). If you can write your algorithm using tasks instead of threads, it sounds like a good idea - you can avoid some synchronization when you run computation using smaller steps.
Declarative approaches (PLINQ in .NET 4.0 is a great option) if you have some higher-level data processing operation that can be encoded using LINQ primitives, then you can use this technique. The runtime will automatically parallelize your code, because LINQ doesn't specify how exactly should it evaluate the results (you just say what results you want to get).
Message-passing allows you two write program as concurrently running processes that perform some (relatively simple) tasks and communicate by sending messages to each other. This is great, because you can share some state (send messages) without the usual synchronization issues (you just send a message, then do other thing or wait for messages). Here is a good introduction to message-passing in F# from Robert Pickering.
Note that the last three techniques are quite related to functional programming - in functional programming, you desing programs differently - as computations that return result (which makes it easier to use Tasks). You also often write declarative and higher-level code (which makes it easier to use Declarative approaches).
When it comes to actual implementation, F# has a wonderful message-passing library right in the core libraries. In C#, you can use Concurrency & Coordination Runtime, which feels a bit "hacky", but is probably quite powerful too (but may look too complicated).

Won't the parallel programming options in .Net 4 be an "easy" way to use threads? I'm not sure what I'd suggest for .Net 3.5 and earlier...
This MSDN link to the Parallel Computing Developer Center has links to lots of info on Parellel Programming including links to videos, etc.

I can recommend this project. Smart Thread Pool
Project Description
Smart Thread Pool is a thread pool written in C#. It is far more advanced than the .NET built-in thread pool.
Here is a list of the thread pool features:
The number of threads dynamically changes according to the workload on the threads in the pool.
Work items can return a value.
A work item can be cancelled.
The caller thread's context is used when the work item is executed (limited).
Usage of minimum number of Win32 event handles, so the handle count of the application won't explode.
The caller can wait for multiple or all the work items to complete.
Work item can have a PostExecute callback, which is called as soon the work item is completed.
The state object, that accompanies the work item, can be disposed automatically.
Work item exceptions are sent back to the caller.
Work items have priority.
Work items group.
The caller can suspend the start of a thread pool and work items group.
Threads have priority.
Can run COM objects that have single threaded apartment.
Support Action and Func delegates.
Support for WindowsCE (limited)
The MaxThreads and MinThreads can be changed at run time.
Cancel behavior is imporved.

"Problematic" is not the word I would use to describe working with threads. "Tedious" is a more appropriate description.
If you are new to threaded programming, I would suggest reading this thread as a starting point. It is by no means exhaustive but has some good introductory information. From there, I would continue to scour this website and other programming sites for information related to specific threading questions you may have.
As for specific threading options in C#, here's some suggestions on when to use each one.
Use BackgroundWorker if you have a single task that runs in the background and needs to interact with the UI. The task of marshalling data and method calls to the UI thread are handled automatically through its event-based model. Avoid BackgroundWorker if (1) your assembly does not already reference the System.Windows.Form assembly, (2) you need the thread to be a foreground thread, or (3) you need to manipulate the thread priority.
Use a ThreadPool thread when efficiency is desired. The ThreadPool helps avoid the overhead associated with creating, starting, and stopping threads. Avoid using the ThreadPool if (1) the task runs for the lifetime of your application, (2) you need the thread to be a foreground thread, (3) you need to manipulate the thread priority, or (4) you need the thread to have a fixed identity (aborting, suspending, discovering).
Use the Thread class for long-running tasks and when you require features offered by a formal threading model, e.g., choosing between foreground and background threads, tweaking the thread priority, fine-grained control over thread execution, etc.

Any time you introduce multiple threads, each running at once, you open up the potential for race conditions. To avoid these, you tend to need to add synchronization, which adds complexity, as well as the potential for deadlocks.
Many tools make this easier. .NET has quite a few classes specifically meant to ease the pain of dealing with multiple threads, including the BackgroundWorker class, which makes running background work and interacting with a user interface much simpler.
.NET 4 is going to do a lot to ease this even more. The Task Parallel Library and PLINQ dramatically ease working with multiple threads.
As for your last comment:
The user will be able to select the number of threads to use (depending on their speed needs and computer power).
Most of the routines in .NET are built upon the ThreadPool. In .NET 4, when using the TPL, the work load will actually scale at runtime, for you, eliminating the burden of having to specify the number of threads to use. However, there are ways to do this now.
Currently, you can use ThreadPool.SetMaxThreads to help limit the number of threads generated. In TPL, you can specify ParallelOptions.MaxDegreesOfParallelism, and pass an instance of the ParallelOptions into your routine to control this. The default behavior scales up with more threads as you add more processing cores, which is usually the best behavior in any case.

Threads are not problematic if you understand what causes problems with them.
For ex. if you avoid statics, you know which API's to use (e.g. use synchronized streams), you will avoid many of the issues that come up for their bad utilization.

If threading is a problem (this can happen if you have unsafe/unmanaged 3rd party dll's that cannot support multithreading. In this can an option is to create a meachism to queue the operations. ie store the parameters of the action to a database and just run through them one at a time. This can be done in a windows service. Obviously this will take longer but in some cases is the only option.

Threads are indispensable tools for solving many problems, and it behooves the maturing developer to know how to effectively use them. But like many tools, they can cause some very difficult-to-find bugs.
Don't shy away from some so useful just because it can cause problems, instead study and practice until you become the go-to guy for multi-threaded apps.
A great place to start is Joe Albahari's article: http://www.albahari.com/threading/.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.