Multithread Batch Queue

Multithread Batch Queue - c#

I am current writing a TCP listener that has many client applications sending short messages to. The TCP listener I have is a C# winform and what I need to do is to process these logs in batches to avoid hitting the database on every message I receive in the Queue. Currently, on every message I receive in the listener, I do Enqueue with the C# Queue Class.
A separate thread will execute every 5 minutes to check this Queue and start processing the Queue if there are any queued items. It seems that there is a concurrency/race condition issue with this design as when the 5 minute thread kicks off, the new messages being received can no longer access the Queue since I have a lock on it during DeQueue. Therefore, these new messages gets lost. It seems to be happening only when there are large amounts of messages being sent to the TCP listener.
Does anyone think this is a flawed design on my part or would there be a much better solution for this? I am not allowed to use MSMQ or WCF based on restrictions from the client application that are sending the messages.

So you have a producer-consumer scenario, with multiple producers and one (buffered) consumer. You may want to take a look at Reactive Extensions (they have a version for .NET 3.5). At the very least, you could leverage their backport of BlockingCollection<T>.

Related

Lock a Service-Bus Queue and prevent others from accessing it

I have multiple queues that multiple clients insert messages into them.
On the server side, I have multiple micro-services that access the queues and handle those messages. I want to lock a queue whenever a service is working on it, so that other services won't be able to work on that queue.
Meaning that if service A is processing a message from queue X, no other service can process a message from that queue, until service A has finished processing the message. Other services can process messages from any queue other than X.
Does anyone has an idea on how to lock a queue and prevent others from accessing it? preferably the other services will receive an exception or something so that they'll try again on a different queue.
UPDATE
Another way can be to assign the queues to the services, and whenever a service is working on a queue no other service should be assigned to the queue, until the work item was processed. This is also something that isn't easy to achieve.

There are several built-in ways of doing this. If you only have a single worker, you can set MessageOptions.MaxConcurrentCalls = 1.
If you have multiple, you can use the Singleton attribute. This gives you the option of setting it in Listener mode or Function mode. The former gives the behavior you're asking for, a serially-processed FIFO queue. The latter lets you lock more granularly, so you can specifically lock around critical sections, ensuring consistency while allowing greater throughput, but doesn't necessarily preserve order.
My Guess is they'd have implemented the singleton attribute similarly to your Redis approach, so performance should be equivalent. I've done no testing with that though.

You can achieve this by using Azure Service Bus message sessions
All messages in your queue must be tagged with the same SessionId. In that case, when a client receives a message, it locks not only this message but all messages with the same SessionId (effectively whole queue).

The solution was to use Azure's redis to store the locks in-memory and have micro-services that manage those locks using the redis store.
The lock() and unlock() operations are atomic and the lock has a TTL, so that a queue won't be locked indefinitely.

Azure Service Bus is a broker with competing consumers. You can't have what you're asking with a general queue all instances of your service are using.

Put the work items into a relational database. You can still use queues to push work to workers but the queue items can now be empty. When a worker receives an item he know to look into the database instead. The content of the message is disregarded.
That way messages are independent and idempotent. For queueing to work these two properties usually must hold.
That way you can more easily sequence actions that actually are sequential. You can use transactions as well.
Maybe you don't need queues at all. Maybe it is enough to have a fixed number of workers polling the database for work. This loses auto-scaling with queues, though.

.Net Socket (UDP) Sending, Receiving and Scheduling

I am currently on a personal project for learning purposes. I want to make a connection over UDP, for application such as games. Each datagram sent has a specific header that indicates which "logic" channel it belongs to - for example, channel 0 is just like UDP with the extra header overhead, and channel 1 uses more headers to bring some extra reliability. The channels objective is to "automatically" separate messages into logic groups, up to a specific amount.
In my current code, there is a simple loop in a separate thread that handles sending and receiving:
// This is pseudo code
public void Tick() {
if(Socket.Poll) {
do {
ReadMessage();
} while(Socket.Available > 0)
}
SendQueuedOutgoingMessages();
}
Though this works on an ideal world, I have this feeling that this logic fails when there are too many incoming or outgoing messages. Is it possible to use the same socket to simultaneously send and receive messages (i.e send and receive are asynchronous or in different threads)? Even if it is possible, would it be better if I simply used two or more UDP sockets (or mix TCP and UDP sockets, if I need reliability), having specially maintainability in mind?
The most direct alternative I can think around this would be to use a scheduling algorithm to control how many messages to read and send, by means of queue sizes or other factors, but this feels poor and inflexible in this situation.
Edit: Adding more information about the code.
The Tick() method is set to be called a specific amount of times per second if it immediately returns. For example, 30 times per second if no new in or out messages exists, and less if it needs some time to send or receive data. I've used blocking ReceiveFrom and SendTo methods as to avoid busy waiting or calls such as Sleep(0).
Though I immediately treat incoming messages, I use an outgoing messages queue to help with the channels idea - each channel has its own priority down to 'no priority', affecting its bandwidth share over time in smooth and busy moments.

Whether or not using 2 sockets for receiving and sending separately, or just a single one depends on the situation. If you are going to send a lot of messages, and even if you are on a dedicated thread, the socket might block if the outgoing queue that is used by the socket becomes full.
There are several solutions to this problem. Using 2 sockets and 2 distinct threads is one of them, using select in combination with asynchronous sockets is another. The point is that you don't want to stop receiving just because a send might block.
Each of these possible solutions have their own complexity.
The select api is meant to check if for a certain socket there is something to receive, but you can also use it to detect if a socket becomes writable again. You need a socket option to put the socket in non blocking state and you need to check for E_WOULDBLOCK return codes with each send. If so the send has failed and you have to queue the message yourself.
You don't really send and receive at the same time, it is sequentially. You use select to check if a socket is writable and readible by using 2 bitmasks manipulated with the fd_set api. You can use select even on multiple sockets at once. Then if select, which is a blocking call, returns, you can check each individual bit to check what actions needs to be performed.
The reason why a send can block, if a socket is not put in nonblocking state is that the output queue of the socket can be full. If the socket is blocking, it would simply wait for the queue to become ready again. But during that wait, you cannot receive anything on that same socket. This is the very reason why you need non blocking sockets and the select api, in combination with some kind of queueing mechanism yourself.

Why not simply the standard read loop setup?
while (true)
ReadMessage();
There is no scheduling or throttling necessary. It is not necessary to know whether a packet is ready or not.
You can read and write simultaneously on the same UDP socket.
There is no need for an outgoing queue, either. Just send.

Request-response pattern with worker queues in RabbitMQ

I'm using RabbitMQ for the following scenario. When a user uses a premium search feature, I send a message via RabbitMQ to one of a few server instances. They run the same routine (DB queries and billing). I want to make sure I don't process the same message more than once.
I've come across this great tutorial but the exchange type presented in it is "Topic", which does not work for me, because I process the same message more than once.
How can I implement the request-response pattern with worker queues in RabbitMQ so that each message is handled only once and there's load balancing?

Anton Gogolev's comment above is correct. You cannot guarantee a message will be processed only once, for many reasons. But, this is often a requirement of systems - to only produce the desired result once.
The way to do that is through idempotence - the idea that no matter how many times a given message is processed, it will only make the desired change once.
There are a lot of ways to do this. One simple example is to use a shared database that tracks which messages have been processed. When you receive a message, you check to see if it has been processed already. If not, you process it. If it has, you just ignore it and move on.
In your case, if you are doing request/response and want load balancing, you probably want multiple consumers on the same queue. You could have 2 or 10 or 300 instances of your request handler listening to the same queue, and you won't have too much worry about duplicate processing.
RabbitMQ will send a given message to a single consumer. It will wait for that consumer to say it is done processing, or if the consumer crashes or rejects the message, it will requeue the message for another consumer to try again.
In this way, you will generally have only 1 request handler per request. But it will always be possible for more than one to handle the same message, which is why idempotence is important.
Regarding the use of a topic exchange vs any other type of exchange - it doesn't make much difference. There will always be the possibility of more than one queue receiving the message that you are sending, because you can have multiple queues bound to the same exchange with the same binding keys.

How do I send lots of messages over NServiceBus without locking the Queue?

So I was doing some performance evaluations of NServiceBus and I realized that it behaves very oddly if you try to send say 1000 messages all at the same time... It actually sends them all async (which is fine) but it locks the queue from the handler. The result is the handler can not process any messages until the senders has completed sending all of there.
The behavior shows up in two slightly different ways.
Inside a Handler if you do a lot of sending, it looks like the receiving queue is locked until the handler completes (so say you add a thread sleep between each send, the receiver won't start handling messages until the Handler completes.
If I just send the message from a newed up Bus then a small sleep breaks the relationship, but if I just send say 1000 messages all at "once" the handler won't get the first one until after the last one is written, even though each one (at that point) should be a seporate call.
Is there an undocumented strategy here to batch send or something else going on... I understand you wouldn't "want" to do this normally, but understanding what happens during a Send from a handler, or a batch send from a normal BUS is pretty important to know ;-).

NServiceBus message handlers, by default, run wrapped in a TransactionScope. The processing of a message, any updates you do to your business data and any send of new messages will either complete or roll back together. This is what transactional messaging is all about.
If you send 1000 messages in a message handler, then it will not complete until the underlying messaging infrastructure has received all of them successfully. This can take some time, depending on your hardware.
If you want to opt out of this safe-by-default approach, there are several things you can do. You can disable transactional handling for your NServiceBus endpoint, or you can just suppress the ambient transaction scope when sending the messages. Notice however that you no longer have any transactional guarantees, so if you get an exception after sending 500 of those 1000 messages those 500 will be sent, while 500 will not.

One of my teams strategy for this is to try to break down a large batch into smaller batches, and then have a handler that receives those smaller batches and pushes out a individual events for each one.
Scenario: We have an endpoint that reads a database log file and pushes out a "TransactionOccurred" event for each line of the log file. We then read the log file again after a 10 second timeout and push out another batch of messages.
So, instead of pushing out 5K messages in one handler, we broke it down into 5 messages of 1K a piece and sent a command of that. Then we had a handler that received the 1K batch message, looped through and published out an individual event for each message.
The issue came in around doing a "publish" for 5K messages because there were several events being published and each one had a different set of subscribers with queues on the same server and remote servers which slowed the system down.
With this strategy we were also able to turn the MaximumConcurrencyLevel up a little to process multiple messages at one time and were able to get a higher throughput.
We have done this on a handful of endpoints and each one is a little different regarding the batch size and the MaximumConcurrencyLevel value. I'd recommend getting a control set of 50-100K messages and moving these values around a little to what is the most optimal for your situation.

message broker consumer/producer with reassign when client goes down?

I am looking for a message broker API to use it with c#.
Normally the things are quite simple. I have a server that knows what jobs are to do and I have some clients that need to get these jobs.
And here are the special requirements I have:
If a client got a job but fails to answer within a specific time, then another client should do the work.
More than one queue and priorities
If possible it needs to work with big message queues (this way I could just load all jobs sometimes a month and forget about it
secured communications would be good.
API for talking with the broker from c#. How much work is done? What is still to do?
Delete some jobs...
If available replication to another broker would be good.
The broker needs to run on windows
What is not an issue:
low latency (there is no problem when a message needs minutes)
Do you know such a message broker that is free to use?

RabbitMQ and several other AMQP implementations satisfy most of (if not all of) these requirements.
RabbitMQ allows clients to acknowledge receipt and/or processing of messages. As per http://www.rabbitmq.com/tutorials/amqp-concepts.html#message-acknowledge:
If a consumer dies without sending an acknowledgement the AMQP broker
will redeliver it to another consumer or, if none are available at the
time, the broker will wait until at least one consumer is registered
for the same queue before attempting redelivery.
Many queues (and in fact many brokers) are supported, in a variety of different configurations
It scales particularly well, even for very large message queues: http://www.rabbitmq.com/faq.html#performance
Encryption is supported: http://www.rabbitmq.com/faq.html#channel-encryption
There is a .NET Client Users Guide and API docs: http://www.rabbitmq.com/documentation.html
There is live failover if a broker dies: http://www.rabbitmq.com/clustering.html
It runs on Windows, Linux, and probably anything else that has an Erlang implementation

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.