We have the following scenario using the .NET RabbitMQ library:
A worker thread picks up 'request' messages from a queue and dispatches them on multiple worker threads for processing. When complete, each worker thread then sends another message.
My question is what 'pattern' do people recommend for the sender in order to get the best throughput and stability?. For example:
1) A singleton 'publisher' instance that is accessed by all worker threads with a single Connection and IModel (using a 'lock' to synchronize access to the IModel)
2) A singleton 'publisher' instance that is accessed by all worker threads with a single Connection and which creates a new IModel for each send request.
or something else?
According to RabbitMQ user guide "IModel instances should not be used by more than one thread simultaneously: application code should maintain a clear notion of thread ownership for IModel instances"
In case a IModel is shared a lock should be use as you said but in my opinion this leads to a more complex code because connection and model should be keep alive in case of disconnection.
Although you don't mention if transactions are required in order to achieve more reliability on message delivery the channel must be set on transaction mode and maybe you will need a transaction per delivery.
Using a new model per delivery ensures a more simple error management but it
obviously will slow down the throughput (and the same by using transactions).
You could also, depending on your requirements, use non durable queues and direct or fanout exchanges which provide better throughput.
As an example, on a development machine a windows service that consumes messages from a queue using one single thread (deserializing, making some business logic, and finally send a new serialized message using transactions by opening/closing connection and model can achieve about 3000 messages processed per second.
(serialization was done via XmlSerializer which is worst than DataContractSerializer)
Related
In NServicebus 7 you can set concurrency that means you can decide how many messages in queue your software can process in parallel.
This can be done at NserviceBus Endpoint level.
I have few doubts about this concept:
the concurrency is per queue not per message Type? Right?
If I use satellites which means I’ll have N different queues (for example: one per message Type), the concurrency will still be per queue?
For example:
I have configured 1 endpoint (so 1 queue) and setted to 10 the concurrency level. I manage 5 different commands (handlers). All the commands are stored in the same queue, mixed. In this case the endpoint is able to take 10 commands per time from the queue without considering the type, correct?
In a second scenario i have 5 satellites which manage the 5 message types, 1 dedicated queue per type. In this case each satellite is able to take 10 messages per time from its queue?
Satellites are an advanced feature for the raw processing of messages without all the benefits of the NServiceBus message processing pipeline. It's not normal to use them—they're most often used when implementing a message transport. For example, the RabbitMQ transport uses a Satellite for a feature that makes an endpoint instance individually addressable, so you have a QueueName queue plus a QueueName-InstanceName queue on the broker, so that another endpoint can do context.Reply() and have the reply go to the specific server that sent the original command. In any case, each satellite manages its concurrency separately, as it's a more low-level construct.
So yes, the concurrency through the main queue is for the endpoint instance, not per message type, because there's a 1:1 relationship between endpoint and queue, and you can't selectively pull messages off the queue by type.
As a result, the endpoint is your unit of scalability, both scaling up (by increasing the concurrency) or out (by adding more endpoint instances on different servers).
This means you should be careful about what message types you process in the same endpoint. They should generally have the same SLA. You wouldn't want a bunch of messages that take 50ms to process held up behind a glut of messages that process for 20 seconds.
Some people will take this to an extreme and go with one endpoint per message type. This level of complexity is usually not necessary, but it does give you the ultimate control over scalability for every single message type.
I have multiple queues that multiple clients insert messages into them.
On the server side, I have multiple micro-services that access the queues and handle those messages. I want to lock a queue whenever a service is working on it, so that other services won't be able to work on that queue.
Meaning that if service A is processing a message from queue X, no other service can process a message from that queue, until service A has finished processing the message. Other services can process messages from any queue other than X.
Does anyone has an idea on how to lock a queue and prevent others from accessing it? preferably the other services will receive an exception or something so that they'll try again on a different queue.
UPDATE
Another way can be to assign the queues to the services, and whenever a service is working on a queue no other service should be assigned to the queue, until the work item was processed. This is also something that isn't easy to achieve.
There are several built-in ways of doing this. If you only have a single worker, you can set MessageOptions.MaxConcurrentCalls = 1.
If you have multiple, you can use the Singleton attribute. This gives you the option of setting it in Listener mode or Function mode. The former gives the behavior you're asking for, a serially-processed FIFO queue. The latter lets you lock more granularly, so you can specifically lock around critical sections, ensuring consistency while allowing greater throughput, but doesn't necessarily preserve order.
My Guess is they'd have implemented the singleton attribute similarly to your Redis approach, so performance should be equivalent. I've done no testing with that though.
You can achieve this by using Azure Service Bus message sessions
All messages in your queue must be tagged with the same SessionId. In that case, when a client receives a message, it locks not only this message but all messages with the same SessionId (effectively whole queue).
The solution was to use Azure's redis to store the locks in-memory and have micro-services that manage those locks using the redis store.
The lock() and unlock() operations are atomic and the lock has a TTL, so that a queue won't be locked indefinitely.
Azure Service Bus is a broker with competing consumers. You can't have what you're asking with a general queue all instances of your service are using.
Put the work items into a relational database. You can still use queues to push work to workers but the queue items can now be empty. When a worker receives an item he know to look into the database instead. The content of the message is disregarded.
That way messages are independent and idempotent. For queueing to work these two properties usually must hold.
That way you can more easily sequence actions that actually are sequential. You can use transactions as well.
Maybe you don't need queues at all. Maybe it is enough to have a fixed number of workers polling the database for work. This loses auto-scaling with queues, though.
I am current writing a TCP listener that has many client applications sending short messages to. The TCP listener I have is a C# winform and what I need to do is to process these logs in batches to avoid hitting the database on every message I receive in the Queue. Currently, on every message I receive in the listener, I do Enqueue with the C# Queue Class.
A separate thread will execute every 5 minutes to check this Queue and start processing the Queue if there are any queued items. It seems that there is a concurrency/race condition issue with this design as when the 5 minute thread kicks off, the new messages being received can no longer access the Queue since I have a lock on it during DeQueue. Therefore, these new messages gets lost. It seems to be happening only when there are large amounts of messages being sent to the TCP listener.
Does anyone think this is a flawed design on my part or would there be a much better solution for this? I am not allowed to use MSMQ or WCF based on restrictions from the client application that are sending the messages.
So you have a producer-consumer scenario, with multiple producers and one (buffered) consumer. You may want to take a look at Reactive Extensions (they have a version for .NET 3.5). At the very least, you could leverage their backport of BlockingCollection<T>.
I'm currently in the process of building an application that receives thousands of small messages through a basic web service, the messages are published to a message queue (RabbitMQ). The application makes use of Dependancy Injection using StructureMap as its container.
I have a separate worker application that consumes the message queue and persists the messages to a database (SQL Server).
I have implemented the SQL Connection and RabbitMQ connections as singletons (Thread Local).
In an ideal world, this all works fine but if SQL Server or RabbitMQ connection is broken I need to reopen it, or potentially dispose and recreate/reconnect the resources.
I wrote a basic class to act as a factory that before it returns a resource, checks it is connected/open/working and if not, dispose it and recreate it - I'm not sure if this is "best practice" or if I'm trying to solve a problem that has already been solved.
Can anyone offer suggestions on how I could implement long running tasks that do a lot of small tasks (in my case a single INSERT statement) that don't require object instantiation for each task, but can gracefully recover from errors such as dropped connections?
RabbitMQ connections seem to be expensive and during high work loads I can quickly run out of handles so I'd like to reuse the same connection (per thread).
The Enterprise Library 5.0 Integration Pack for Windows Azure contains a block for transient fault handling. It allows you to specify retry behavior in case of errors.
It was designed with Windows Azure in mind but I'm sure it would be easy to write a custom policy based on what it offers you.
You can make a connection factory for RabbitMQ that has a connection pool. It would be responsible for handing out connections to tasks. You should check to see that the connections are ok. If not, start a new thread that closes/cleans the connection then returns it to the thread pool. Meanwhile return a functioning connection to the user.
It sounds complicated but it's the pattern for working with hard to initialize resources.
We're creating a c# app that needs to communicate with one other system via TCP/IP sockets. We expect to receive about 1-2 incoming transactions per second, each message averaging around 10k in size (text and 1 image).
We'll then do some processing (could take anywhere from 100 milliseconds to 3 seconds, depending on a few variables), and then send a response back of around 1k.
In the examples I've looked at, some are multi-threaded. For this application, would it be better to make it single or multi-threaded? If multi-threaded is recommended, roughly what would the different threads do?
(not specific to C#)
Having done it both ways (extreme performance was not the deciding factor), I much prefer the 1-thread-per-connection approach.
Listener Thread
The job of this thread is to listen on the socket for incoming connections, accept them, and spawn a new connection thread (giving it the connected socket).
Connection Threads
These threads (one per connection) handle all of the communication with the connected socket. They may also handle the processing of requests if it is synchronous (you will need to look into that for your specific app).
When the connection dies, this thread dies as well.
Management Threads
If cleanup, or periodic maintenance need performed, these can all run in their own threads.
Just keep in mind locking (obviously):
How much data do connections need to share? Make sure all of your resources are correctly locked when accessing, and that you do not have any deadlocks or race conditions. That is more of a "general threading" topic however.
If you're expecting multiple connections you'll need multiple threads. Each thread will be given streams for a particular client which it will need to handle separately.
I think the Silverlight policy server is a great first time example of a multithreaded server app. Though, it uses the Socket class instead of the TcpListener.
I would accept sockets and use async calls. Allows you to accept multiple connections and avoids creating a thread for every connection.
Basically create a socket with the listener
Socket socket = tcpListener.AcceptSocket();
and Socket.BeginReceive to start receiving data.
I think it's important to define what is meant by the word 'connection.'
If the other system creates a new connection to your TcpListener each time a transaction is sent, then this would be considered multiple connections, and it would make sense to have a dedicated thread to process these incoming connection requests. If this is the case, ignore everything beyond this and use gahooa's suggestion solution.
On the other hand, if the other system simply establishes a connection and sends all transactions over that same connection, then there's really no point in processing the connection request in a separate thread (since there is only one connection). If this is the case, then I would suggest accepting the incoming request once and reading the socket asynchronously (as opposed to polling the socket for data). Each time you receive a complete transaction, throw it "over the wall" to a transaction processing thread. When the processing is complete and the "answer" is computed, throw it "over the wall" to a response thread that sends the result back to the other system. In this scenario, there are basically four threads:
Main thread
Read thread
Processing thread
Write thread
The Main thread creates the TcpListener and waits until the connection is established. At that point, it simply initiates the asynchronous read and waits until the program ends (ManualResetEvent.WaitOne()).
The Read thread is the asynchronous thread that services the reading from the NetworkStream.
The Processing thread receives transactions from the Read thread and does whatever processing is necessary.
The Write thread takes whatever responses are generated by the Processing thread and writes them to the NetworkStream.
According to this link, you do not have to synchronize the reading and writing from the same NetworkStream as long as the reading and writing are done in separate threads. You can use a generic List or Queue to move data between the threads. I'd create one for the Read-to-Processing data and one for the Processing-to-Write data. Just be sure to synchronize access to them using the SyncRoot property.