I am trying to set up a concurrent queue that will enqueue data objects coming in from one thread while another thread dequeues the data objects and processes them. I have used a BlockingCollection<T> and used the GetConsumingEnumerable() method to create a solution that works pretty well in simple usage. My problem lies in the facts that:
the data is coming in quickly, data items being enqueued approximately every 50ms
processing each item will likely take significantly longer than 50ms
I must maintain the order of the data items while processing as some of the data items represent events that must be fired in the proper order.
On my development machine, which is a pretty powerful setup, it seems the cutoff is about 60ms of processing time for getting things to work right. Beyond that, I have problems either with having the queue grow continuously (not dequeuing fast enough) or having the data items processed in the wrong order depending on how I set up the processing with regard to whether/how much/where I parallelize. Does anyone have any tips/tricks/solutions or can point me to such that will help me here?
Edit: As pointed out below, my issue is most likely not with the queuing structure itself so much as it is with trying to dequeue and process the items faster. Are there trick/tips/etc. for portioning out the processing work so that I can keep dequeuing quickly while still maintaining the order of the incoming data items.
Edit (again): Thanks for all your replies! It's obvious I need to put some more work into this. This is all great input, though and I think it will help point me in the right direction! I will reply again either with a solution that I came up with or a more detailed question and code sample! Thanks again.
Update: In the end, we went with a BlockingCollection backed by a ConcurrentQueue. The queue worked perfectly for what we wanted. In the end, as many mentioned, the key was making the processing side as fast and efficient as possible. There is really no way around that. We used parallelization where we found it helped (in some cases it actually hurt performance), cached data in certain areas, and tried to avoid locking scenarios. We did manage to get something working that performs well enough that the processing side can keep up with the data updates. Thanks again to everyone who kicked in a response!
If you are using TPL on .NET 4.0, you can investigate the TPL Dataflow library simple usage, as this library (it's not a third party, it's a library from Microsoft being distributed via NuGet) provide the logic which saves the order of data being processed in your system.
As I understand, you got some data which will come in order, which you have to mantain after some work at each of data item. You can use for this TransformBlock class or BufferBlock linked with ActionBlock: simply put the data on it's input, set up the action you need to be run on each item, and link this block with classes you need (you even can make it IObservable to create a responding UI.
As I said, TPL Dataflow blocks are incapsulating FIFO queue logic, and they are saving the order for results on their action. And the code you can write with them is multithreading-oriented (see more about maximum degree of parallelizm in TPL Dataflow).
I think that you are okay with the blocking queue. I enqueue thousands of messages per second in a BlockingCollection and the overhead is very small.I think you should do the following:
Add a synchronized sequence number when enqueuing the messages
Use multiple consumers to try to overload the queue
In general focus on the processing time. The default collection type for BlockingCollection is ConcurrentQueue, so the default is that the it is a FIFO (First in, first out) queue, so something else seems to be wrong.
some of the data items represent events that must be fired in the
proper order.
Then you may differentiate dependent items and process them in order while processing other items in parallel. Maybe you can build 2 separate queues, one for items to be processed in order, dequeued an processed with a single thread and another dequeued by multiple threads.
We need to know more about input and expected processing.
Related
In this scenerio, I have to Poll AWS SQS messages from a queue, each async request can fetch upto 10 sqs items/messages. Once I Poll the items, Then I have to process those items on a kubernetes pod. Item processing includes getting response from few API calls, it may take some time & then saving the item to DB & S3.
I did some R&D & reach on following conclusion
To use consumer producer model, 1 thread will poll items & another thread will process the item or to use multi-threading for item processing
Maintain a data structure that will containes sqs polled items ready for processing, DS could be Blocking collection or Concurrent queue
Using Task Parellel Library for threadpooling & in item processing.
Channels can be used
My Queries
What would be best approach to achieve best performance or increase TPS.
Can/Should I use data flow TPL
Multi threaded or single threaded with asyn tasks
This is very dependant on the specifics of your use-case and how much effort would you want to put in.
I will, however, explain the thought process I would use when making such a decision.
The naive solution to handle SQS messages would be to do it one at a time sequentially (i.e. without concurrency). It doesn't mean that you're limited to a single message at a time since you can add more pods to the cluster.
So even in that naive solution you have one concurrency point you can utilize but it has a lot of overhead. The way to reduce overhead is usually to utilize the same overhead but process more messages with it. That's why, for example, SQS allows you to get 1-10 messages in a single call and not just one. It spreads the call overhead over 10 messages. In the naive solution the overhead is the cost of starting a whole process. Using the process for more messages means concurrent processing.
I've found that for stable and flexible concurrency you want many points of concurrency, but have each of them capped at some configurable degree of parallelism (whether hardcoded or actual configuration). That way you can tweak each of them to achieve optimal output (increase when you have free CPU and memory and decrease otherwise).
So, where can the additional concurrency be introduced? This is a progression where each step utilizes resources better but requires more effort.
Fetch 10 messages instead of one for every SQS API call and process them concurrently. That way you have 2 points of concurrency you can control: Number of pods, number of messages (up to 10) concurrently.
Have a few tasks each fetching 1-10 tasks and processing them concurrently. That's 3 concurrency points: Pods, tasks and messages per task. Both these solutions suffer from messages with varying processing time, meaning that a single long running message will "hold up" all the other 1-9 "slots" of work effectively reducing the concurrency to lower than configured.
Set up a TPL Dataflow block to process the messages concurrently and a task (or few) continuously fetching messages and pumping into the block. Keep in mind that SQS messages need to be explicitly deleted so the block needs to receive the message handle too so the message can be deleted after processing.
TPL Dataflow "pipe" consisting of a few blocks where each has it's own concurrency degree. That's useful when you have different steps of processing of the message where each step has different limitations (e.g. different APIs with different throttling configurations).
I personally am very fond of, and comfortable with, the Dataflow library so I would go straight to it. But simpler solutions are also valid when performance is less of an issue.
I'm not familiar with Kubernetes but there are many things to consider when maximising throughput.
All the things which you have mentioned is IO bound not CPU bound. So, using TPL is overcomplicating the design for marginal benefit. See: https://learn.microsoft.com/en-us/dotnet/csharp/async#recognize-cpu-bound-and-io-bound-work
Your Kubernetes pods are likely to have network limitations. For example, with Azure Function Apps on Consumption Plans is limited to 1,200 outbound connections. Other services will have some defined limits, too. https://learn.microsoft.com/en-us/azure/azure-functions/manage-connections?tabs=csharp#connection-limit. Due to the nature of your work, it is likely that you will reach these limits before you need to process IO work on multiple threads.
You may also need to consider limits of the services which you are dependent on and ensure they are able to handle the throughput.
You may want to consider using Semaphores to limit the number of active connections to satisfy both your infrastructure and external dependency limits https://learn.microsoft.com/en-us/dotnet/api/system.threading.semaphoreslim?view=net-5.0
That being said, 500 messages per second is a realistic amount. To improve it further, you can look at having multiple processes with independent resource limitations processing the queue.
Not familiar with your use case, or specifically with the tech you are using, but this sounds like a very common message handling scenario.
Few guidelines:
First, these are guidelines, your usecase might be very different then what the ones commenting here are used to.
Whenever you want to increase your throughput you need to identify
your bottlenecks, and thrive towards CPU bottleneck, making sure you
fully utilize it. CPU load is usually the most expensive, and
generally makes for a more reliable metric for autoscaling. Obviously, depending on your remote api calls and your DB you might reach other bottlenecks - SQS queue size also makes for a good autoscaling metric, but keep in mind that autoscalling isn't guaranteed to increase you throughput if your bottleneck is DB or API related.
I would not go for a fancy solution with complex data structures, again, not familiar with your usecase, so I might be wrong - but keep it simple. There should be one thread that is responsible for polling the queue, and when it finds new messages it should create a Task that processes a batch. There should generally be one Task per processing batch - let the ThreadPool handle the number of threads.
Not familiar with .net SQS library. However, I am familiar with other libraries for very similar solutions. Most Libraries for queues out there already do it all for you, and you don't really have to worry about it. You should probably just have a callback function that is called when the highly optimized library already finds new messages. Those libraries probably already create a new task for each of those batches - you just need to register to their callback, and make sure you await any I/O bound code.
Edit: The solution I am proposing does have a limitation in that a single message can block an entire batch, this is not necessarily a bad thing - if your solution requires different processing for different messages, and you don't want to create this inner batch dependency, a TPL DataFlow could definitely be a good solution for your usecase.
Yeah, this sounds very much like the task for TPL Dataflow, it is very versatile yet powerful instrument. Your first chain link would acquire messages from the queue (not neccessarily one-threaded-ly, you just pass some delegates in). You will also be in control of how many items are "queued" locally this way.
Then you "subscribe" your workers in any way you desire – you can even customize it so that "faulted" processings would be put back into your queue — and it woudn't even matter if your processing is IO bound or not. If it is — well, nice, TPL dataflow is asyncronous, if not — well, not a problem, TPL dataflow can also be syncronous. Or you can fire up some thread pool threads, no biggie.
I have a series of calculations that need to be processed - the calculations and the order they run are all defined by the user on the UI.
If they just ran one after each other, it wouldn't be too hard. However, some of the calculations need to be processed concurrently and all calculations must have the ability to be individually paused at any time. I also need to be able to re-arrange orders or add new calculations to be processed at any time. So whatever I do must be flexible enough to handle this.
On the UI, imagine a listbox (a queue, if you like) of usercontrols - with each usercontrol displaying the name of the calculation and a pause button. And I can add calculations to this list at any time during processing.
What is the best way to do this?
Should I be running each calculation in its own thread? If so, how should I store the list of running processes? How will I pass the queue to the calculation processor? How will I be able to ensure that every time the queue changes (new ordering or new calculation) the calculation processor will be made aware of this?
My initial thoughts were to have:
CalcProcessor class
CalcCalculation class
In CalcProcessor have 2 Lists of CalcCalculations. One being the "queue" as shown on the UI (perhaps a pointer to it? Or some other way to ensure it updates live), and the other being the list of currently running calculations.
Somehow I need to get the CalcCalculation to be running in its own thread to process the calculation, and be able to handle any pause events. So I need some way to transmit the info of the Pause button being pressed in the UI to the CalcProcessor object, and then into the correct CalcCalculation.
Edit in response to David Hope:
Thanks for your reply.
Yes, there are n calculations but this could change at any time due to being able to add more calculations to process on the UI.
They do not need to share data in anyway. There will be a setting in the application to specify how many should run concurrently (ie. 10 at any given time, the first 10 in the queue for example - and when 1 finishes the next calculation in the queue will start processing).
The calculation will involve taking data from some data source - it could be a database or a file, and analysing it and performing some calculations on that data. When I say the calculation needs to be paused, I don't mean pausing the thread... I just mean (for example, as I haven't written this part of the application yet) if it is reading row by row from a database and doing some live calculations pausing at the completion of processing the current row... and continuing on when the pause button is unclicked on the UI - which could be done with something as primitive as a while(notPaused) loop providing I can get the Pause information from the UI into the thread.
There are several questions here:
How to synchronize the UI and the model?
I think you got this one backwards. Your model shouldn't have a “pointer” to the queue you're showing in the UI. Instead, the queue should be in your model and you should use databinding together with INotifyPropertyChange and ObservableCollection to show the queue on the UI. (At least that's how it's done in WPF.)
This way, you can manipulate your queue directly from your model, and it will automatically show on the UI.
How to start and monitor calculations?
I think Tasks are ideal for this. You can start a Task using Task.Factory.StartNew(). Since it seems your Tasks will take long to execute, you might consider using TaskCreationOptions.LongRunning. You can also use the Task to find out when is the calculation complete (or if it failed with an exception).
How to pause running calculations?
You can use ManualReserEventSlim for that. Normally, it would be set, but if you wanted to pause a running Task, you would Reset() it. The calculation will need to periodically call Wait() on that event. It's not possible to reasonably pause a running thread without cooperation from the calculation on that thread.
If you were using C# 5.0, a better approach would be to use something like PauseToken.
In Framework 4.5, the answer here is the Async API, which removes the need to manage threads. For details, look at the async/await keywords.
From a broader perspective, a "CalcProcessor" class is a good idea, but I think the Task object will suffice to replace your "CalcCalculation" class. The Processor can simply have an Enumerable of Tasks. The Processor can expose methods for managing the queue, if needed, as well as returning information about its status. When your application finally reaches a state where it must have the results, you can use the AwaitAll method to block the CalcProcessor's thread until all of the tasks complete.
Without more information about the actual goal here, it's hard to give better advice.
You can use Observer Pattern to display results on UI and order changes back in to Processor. State and Command patterns will help you to start, pause, cancel the calculations. These patterns have great answers to your questions in design way. Concurrency is still a problem, they do not answer multi-threading problems but they open an easier road to manage threads.
I suggest that you haven't broken the problem down far enough, which is the reason you are frustrated.
You need to start small and build up from there. You mention, but don't define your actual requirements, but they seem to be...
Need to be able to run ?N? calculations
Some need to be run concurrently (does this imply that they share data, if so how are you going to share the data)
Must be able to pause the calculation (don't use Thread.Suspend, as it potentially leaves a thread in an unstable state, particularly bad if you are sharing data), so you will need to build pause points into each calculation. Also need to consider how you are going to communicate the pause/unpause to the calculation
As far as methods, there are several to consider...
Threads are an obvious choice, but require careful tending too (starting, pausing, stopping, etc...)
You could also use BackGroundWorker or possibly Parallel.ForEach
BackGroundWorker contains the framework for cancelling the worker and providing progress (which can be useful).
My recommendation to start would be to go with BackGroundWorker, potentially subclass it to add the Pause/Resume functionality you need. Determine how you are going to manage data sharing (at least use lock to protect against simultaneous access).
You may find BackGroundWorker too restrictive and need to go with Threads, but I'm usually able to avoid it.
If you post more clear requirements, or samples of what you've tried and didn't work, I'll be happy to help more.
For queue you can use heap data structure (priority queue). This will help prioritize yours tasks. Also you should use Thread Pool for effectively calculations. And try to split you tasks to little parts.
Blocking Collections are getting more pile up than Normal Queue. In Following Scenario,
I have a dedicated Thread as a Consumer.
Three or more dedicated Threads as Producer.
I have checked with Normal Queue (Monitor.Enter...) as well as Blocking Collection.
Results:
Both Queues are getting pile up (Obviously , Consumers < Producers)
Normal Queues are automatically cleared at some point & not keep on increasing after 20000 or 30000.
But Blocking Collection are keep on increasing more than hundreds of thousands and Obviously we have no clear option, at the same time i dont want to restrict the producer
Can any one Shed some light ..
This is a suggestion I keep making - try ZeroMQ out. The producer/consumers pattern is well supported (use PUSH and PULL sockets), and it will be blindingly fast. Since you're using the same process, you have no message loss to worry about.
I have never written multi-threaded code before (barring a few basic backgroundworker tricks) and am hoping for some guidance about how I would approach my problem.
I have an XML file which is a serialized List<Stock>. For each one of these stock items I need to perform a webservice call called UpdatePrice().
What I want to do is take each one of these items, create a threadpool (who's size depends on the amount of rows I will need to process) and begin making webservice calls.
I am not asking for a complete solution (obviously) but would really appreciate some guidance about how one would typically solve this problem.
The biggest issue that I see arising is how I would designate which threads would work on which objects. Do I simply take the list divide it by the number of threads I make and split the work? Or am I better off allowing each thread to arbitrarily pick an item from the list to process? (Then I have locking issues but as a plus can ensure no thread is idle)
As I said before I am not looking for a complete solution but just some basic guidance on where to start because honestly I am lost on this one and haven't written a single line of code.
PS: Also are autogenerated webservice proxies in .NET threadsafe?
I would suggest looking into TPL and PLINQ for a solution. A simple example solution using Parallel.ForEach() could look like this (parallel calls limited to 5 in the example).
List<Stock> stocks;
Parallel.ForEach(stocks,
new ParallelOptions() { MaxDegreeOfParallelism = 5 },
(stock) =>
{
float newPrice = UpdatePrice(stock.TickerSymbol); //web service call
stock.Price = newPrice;
});
i would:
First read the whole XML data synchronously.
Then, i would put each element to be processed in a single queue.
Then, you can spawn N processing threads, in which at the beginning of each one, it would "pop" an element of your queue, wrapping this specific piece of code in a mutex / semaphore (Google C# mutex, or concurrent access, or anything related). This is easily done in C# with the "lock" keyword on an arbitrary object.
Hope this helps.
Pierre.
There's no point in using threads here. A thread can only give you one resource: more cpu cycles, provided that you have a CPU with multiple cores. That is not the resource that you need to speed up your program. You need a faster Internet connection.
If you have an UI you don't want frozen then the BackgroundWorker tricks will work just fine.
I was googling for some advise about this and I found some links. The most obvious was this one but in the end what im wondering is how well my code is implemented.
I have basically two classes. One is the Converter and the other is ConverterThread
I create an instance of this Converter class that has a property ThreadNumber that tells me how many threads should be run at the same time (this is read from user) since this application will be used on multi-cpu systems (physically, like 8 cpu) so it is suppossed that this will speed up the import
The Converter instance reads a file that can range from 100mb to 800mb and each line of this file is a tab-delimitted value record that is imported to another destination like a database.
The ConverterThread class simply runs inside the thread (new Thread(ConverterThread.StartThread)) and has event notification so when its work is done it can notify the Converter class and then I can sum up the progress for all these threads and notify the user (in the GUI for example) about how many of these records have been imported and how many bytes have been read.
It seems, however that I'm having some trouble because I get random errors about the file not being able to be read or that the sum of the progress (percentage) went above 100% which is not possible and I think that happens because threads are not being well managed and probably the information returned by the event is malformed (since it "travels" from one thread to another)
Do you have any advise on better practices of implementation of threads so I can accomplish this?
Thanks in advance.
I read very large files in some of my own code and, I have to tell you, I am skeptical of any claim that adding threads to a read operation would actually improve the overall read performance. In fact, adding threads might actually reduce performance by causing head seeks. It is highly likely that any file operations of this type would be I/O bound, not CPU bound.
Given that the author of the post you referenced never actually provided the 'real' code, his claims that multiple threads will speed up I/O remain untestable by others. Any attempt to improve hard disk read/write performance by adding threads would most certainly be I/O bound, unless he is doing some serious number crunching between reads, or has stumbled upon some happy coincidence having to do with the disk cache, in which case the performance improvement might be unreproduceable on another machine with different hardware characteristics.
Generally, when files of this size are involved, an additional 20% or 30% improvement in performance is not going to matter much, even if it is possible utilizing threads, because such a task would most certainly be considered a background task (not real-time). I use multiple threads for this kind of work, not because it improves read performance on one file, but because multiple files can be processed simultaneously in the background.
Before using threads to do this, I carefully benchmarked the software to see if threads would actually improve overall throughput. The results of the tests (on my development machine) were that using the same number of threads as the number of processor cores produced the maximum possible throughput. But that was processing ONE file per thread.
Multiple threads reading a file at a time is asking for trouble. I would set up a producer consumer model such that the producer read the lines in the file, perhaps into a buffer, and then handed them out to the consumer threads when they complete processing their current work load. It does mean you have a blocking point where the lines are handed out but if processing takes much longer than reading then it shouldn't be that big of a deal. If reading is the slow part then you really don't need multiple consumers anyway.
You should try to just have one thread read the file, since multiple threads will likely be bound by the I/O anyway. Then you can feed the lines into a thread-safe queue from which multiple threads can dequeue lines to parse.
You won't be able to tell the progress of any one thread because that thread has no defined amount of work. However, you should be able to track approximate progress by keeping track of how many items (total) have been added to the queue and how many have been taken out. Obviously as your file reader thread puts more lines into the queue your progress will appear to decrease because more lines are available, but presumably you should be able to fill the queue faster than workers can process the lines.