I'm writing an ASP .NET Core application where I'm using the Google PubSub emulator where I can both publishe and subscribe to a topic. However, when I publish a "large" amount of messages 1000+, I would like to pull as many as possible.
I use the Google.Cloud.PubSub.V1 library which provides SubscriberServiceApiClient to interact with their API. I pull asynchronously with the PullAsync method which has the parameter maxMessages. According to their documentation this decides the max number of messages that can be pulled by each request, however it may return fewer. If I provide an argument that specifies a maxMessages number above 100, it will not make a difference. This means the maximum number of messages I can receive from each request is always 100, which seems low. I've also tried to pull through their REST Api, which is also limited to 100 messages per pull.
I'm unsure whether it is due to some limit or if I'm doing something wrong. I have tried searching in their documentation and elsewhere, but without luck.
In general, Google Cloud Pub/Sub cannot return more than 1,000 messages to a single PullAsync call. This may be even smaller when running through the emulator. The value of returnImmediately can also affect how many messages are returned. If you want to maximize the number of messages returned, then you'll want to set returnImmediately to false. However, even in this scenario, you'll not necessarily get maxMessages in each response; Cloud Pub/Sub tries to balance returning fuller responses with minimizing end-to-end latency by waiting too long.
In general, to maximize throughput, you'll need to have multiple PullAsync calls active at once. However, even better is to use SubscriberClient, which handles the underlying requests behind the scenes for you and delivers messages to the function you specify as they arrive.
Max messages is still capped at 1,000 messages in November 2019. Pubsub does not allow to get more messages at a time. As seen in the picture below, I tried to pull messages in a loop, with 1,000 at a time. In half of the requests it gets a lot less than the maximum amount of messages. I managed to pull around 50,000 messages within the 9 minutes maximum runtime of a Cloud Function.
An alternative solution is async subscribing to a pubsub topic with google.cloud.pubsub_v1.SubscriberClient.subscribe(). However, this solution is better suited in a long running process which you could describe as a sort of collector sitting on a server.
Related
I have seen some articles like below suggesting for calculating the PrefetchCount.
When using the default lock expiration of 60 seconds, a good value for SubscriptionClient.PrefetchCount is 20 times the maximum processing rates of all receivers of the factory. For example, a factory creates 3 receivers, and each receiver can process up to 10 messages per second. The prefetch count should not exceed 20*3*10 = 600.
But still i have no idea on the following things,
how to get the Receiver count created from the factory?
how to get the details of number of message processed by the Receiver?
Thanks in advance.
Receivers are created in your code, so you should know how many of them you have.
It doesn't really matter though. All you need to know is how many messages in total you can process within a second (per factory). You may add a custom performance counter for this, or run a test with only one factory and look into Azure Monitor statistics in the portal.
If you are calling ReceiveBatch explicitly, be sure to set PrefetchCount at a value higher than batch size. How much higher - depends on your timings a bit, but the goal is that there's always a prefetched batch available at any call to ReceiveBatch.
Finally, I should suggest that this recommendation is approximate and you don't have to exactly follow the formula. Play with PrefetchCount a bit and determine which value gives you max performance.
Setup
I have a model of a distributed system in which there is a producer(P), a consumer(C) and 1, 2, 3, ... n workers(Wn). All of these components are communicating via the Microsoft Azure Service Bus(B). Inside the bus, there is a topic(T) and a queue(Q).
The (P) is at varying rates pushing messages into the (T). The (Wn)'s [number of them is a consequence of the rate of (P)'s messages] are fetching these messages from there, altering them according to some pre-defined function and after this forwards the messages to the (Q), from which the (C) picks them up and handles them according to plan.
Purpose
The purpose of this model is to investigate the scalability of a system like this, with specific regards taken to the Azure Service Bus. The applications themselves are written in C# and they are all executed from within the same system.
Questions
I have two concerns in regard to the functionality of the Azure Service Bus:
Is there a way to tell either the (B) to be more loose in terms of balancing, or perhaps make the (W)'s more 'eager' to participate?
There seems to be a pre-destined order of message distribution, making the load balance uneven (among the (W)'s).
Say for instance I have 3 (W)'s - or (W3): if (P) now were to send 1.000 messages to the (T), I would expect a somewhat even distribution, going towards 1/3 of all messages for each of the (W). This is however not the case; it seems as if the rest of the (W)'s just sits there waiting for the busy (W) to handle message after message after message. Suddenly, perhaps after 15 to 20 messages, another (W) will receive a message, but still the balance is very uneven.
Consequently, I now have (W)'s just sitting around doing nothing (for varying periods of time).
Is there a way, either in (B)'s settings or (W)'s code, to specifically set the time of the PeekLock()?
I have experimented with Thread.Sleep(timeToSleep) in the (W) OnMessage()-function. This seems to fit my needs, if it wasn't for the concern aired in the first question.
My experimentation: whenever a message arrives at a (W), the work begins and just before message.Complete() is sent to the (B), I pull off a Thread.Sleep(2000) or something along that line. Ideally, another (W) should pick up where this first (W) fell asleep, but they don't. The first (W) wakes up and grabs another message and so the cycle continues, sometimes 15-20 times until another (W) finally grabs a message.
Images
If you excuse my poor effort at explaining through drawing, this is current scenario (figure 1) and the ideal, wanted, scenario (figure 2):
Figure 1: current scenario
Figure 2: optimal/wanted scenario
I hope for some clarification on this matter. Thank you in advance!
The distribution of messages across consumers is handled in the about order that requests for messages are being made to Service Bus. There is no assurance for excatly even distribution at the message level, and the distribution will be affected by feature usage, including prefetch. In any actual workload situation, you'll find that distribution is fair, because busy workers will not ask for more messages.
I have a pretty simple queue which happens to have heaps of messages on it (by design). Heaps == .. say ... thousands.
Right now I'm playing around with using Azure Web Jobs with a Queue Trigger to process the messages. Works fine.
I'm worried about performance though. Lets assume my method that processes the message takes 1 sec. With so many messages, this all adds up.
I know I can manually POP a number of messages at the same time, then parallel process them .. but I'm not sure how we do this with web jobs?
I'm assuming the solution is to scale out ?? which means I would create 25 instances of the webjob? Or is there a better way where I can trigger on a message but pop 25 or so messages at once, then parallel them myself.
NOTE: Most of the delay is I/O (ie. a REST call to a 3rd party). not CPU.
I'm thinking -> create 25 tasks and await Task.WhenAll(tasks); to process all the data that I get back.
So - what are my options, please?
NOTE #2: If the solution is scale out .. then I also need to make sure that my web job project only has one function in it, right? otherwise all the functions (read: triggers, etc) will also all be scaled out.
Azure WebJobs have a default configuration of processing 16 messages in parallel and this number is configurable. The WebJobs framework internally creates 16 (or the configured MaxDequeueCount) copies of the your function and runs them in parallel.
Moreover, you can launch multiple instances, i.e. scale out, of the Azure App Service/Website hosting the WebJob, subject to maximum of 20 instances. However, scaling out instances (unlike parallel Dequeue above) has a pricing impact so please check on that.
Thus theoretically you could be processing 24*20 = 480 messages in parallel by configuration alone on a standard WebJob function without any custom code.
We have a pretty standard TCP implementation of SocketAsyncEventArgs (no real difference to the numerous examples you can google).
We have a load testing console app (also using SocketAsyncEventArgs) that sends x many messages per second. We use thread spinning to introduce mostly accurate intervals within the 1000ms to send the message (as opposed to sending x messages as fast as possible and then waiting for the rest of the 1000ms to elapse).
The messages we send are approximately 2k in size, to which the server implementation responds (on the same socket) with a pre-allocated HTTP OK 200 response.
We would expect to be able to send 100's if not 1000's of messages per second using SocketAsyncEventArgs. We found that with a simple blocking TcpListener/TcpClient we were able to process ~150msg/s. However, even with just 50 messages per second over 20 seconds, we lose 27 of the 1000 messages on average.
This is a TCP implementation, so we of course expected to lose no messages; especially given such a low throughput.
I'm trying to avoid pasting the entire implementation (~250 lines), but code available on request if you believe it helps. My question is, what load should we expect from SAEA? Given that we preallocate separate pools for Accept/Receive/Send args which we have confirmed are never starved, why do we not receive an arg.Complete callback for each message?
NB: No socket errors are witnessed during execution
Responses to comments:
#usr: Like you, we were concerned that our implementation may have serious issues cooked in. To confirm this we took the downloadable zip from this popular Code Project example project. We adapted the load test solution to work with the new example and re-ran our tests. We experienced EXACTLY the same results using someone else's code (which is primarily why we decided to approach the SO community).
We sent 50 msg/sec for 20 seconds, both the code project example and our own code resulted in an average of 973/1000 receive operations. Please note, we took our measurements at the most rudimentary level to reduce risk of incorrect monitoring. That is, we used a static int with Interlocked.Increment on the onReceive method - onComplete is only called for asynchronous operations, onReceive is invoked both by onComplete and when !willRaiseEvent.
All operations performed on a single machine using the loopback address.
Having experienced issues with two completely different implementations, we then doubted our load test implementation. We confirmed via Wireshark that our load test project did indeed send the traffic as expected (fragmentation was present in the pcap log, but wireshark indicated the packets were reassembled as expected). My networking understanding at low levels is weaker than I'd like, I admit, but given the amount of fragmentation nowhere near matches the number of missing messages, we are for now assuming the two are not related. As I udnerstand it, fragmentation should be handled at a lower layer, and completely abstracted at our level of API calls.
#Peter,
Fair point, in a normal networking scenario such level of timing accuracy would be utterly pointless. However, the waiting is very simple to implement and wireshark confirms the timing of our messages to be as accurate as the pcap log's precision allows. Given we are only testing on loopback (the same code has been deployed to Azure cloud services also which is the intended destination for the code once it is production level, but the same if not worse results were found on A0, A1, and A8 instances), we wanted to ensure some level of throttling. The code would easily push 1000 async args in a few ms if there was no throttling, and that is not a level of stress we are aiming for.
I would agree, given it is a TCP implementation, there must be a bug in our code. Are you aware of any bugs in the linked Code Project example? Because it exhibits the same issues as our code.
#usr, as predicted the buffers did contain multiple messages. We now need to work out how it is we're going to marry messages back together (TCP guarantees sequence of delivery, but in using multiple SAEA's we lose that guarantee through threading).
The best solution is to abandon custom TCP protocols. Use HTTP and protocol buffers for example. Or, web services. For all of this there are fast and easy to use asynchronous libraries available. Assuming this is not what you want:
Define a message framing format. For example, prepend BitConvert.GetBytes(messageLengthInBytes) to each message. That way you can deconstruct the stream.
I am working on two apps that use an MSMQ as a message bus mechanism so that A transfers messages to B. This clearly has to be robust so initially we chose MSMQ to store and transfer the messages.
When testing the app we noticed that in real-world conditions, where msmq is called to handle approximately 50.000 messages a minute (which sounds quite low to me) then we quickly reach the max storage size of the msmq /storage directory (defaults to 1.2gb i think).
We can increase that but I was wondering whether there is a better approach to handle slow receivers and fast senders. Is there a better queue or a better approach to use in this case?
Actually it isnt so much a problem of slow receivers since msmq will maintain the (received) messages in the storage dir for something like 6 hours or until the service is restarted. So essentially if in 5 minutes we reach the 1gb threshold then in a few hours we will reach terratybes of data!
Please read this blog to understand how MSMQ uses resources which I put together after years of supporting MSMQ at Microsoft.
It really does cover all the areas you need to know about.
If you have heard something about MSMQ that isn't in the blog then it is alomost certainly wrong - such as the 1.2GB storage limit for MSMQ. The maximum size of the msmq\storage directory is the hard disk capacity - it's an NTFS folder!
You should be able to have a queue with millions of messages in it (assuming you have enough kernel memory, as mentioned in the blog)
Cheers
John Breakwell
You should apply an SLA to your subscribers, they have to read their messages with in X amount of time or they lose them. You can scale this SLA to match the volume of messages that arrive.
For subscribers that cannot meet their SLA then simply put, they don't really care about receiving their messages that quickly (if they did, they would be available). For these subscribers you can offer a slower channel, such as an XML dump of the messages in the last hour (or what ever granularity is required). You probably wouldn't store each individual message here, but just an aggregate of changes (eg, something that can be queried from a DB).
Use separate queues for each message type, this way you can apply different priorities depending on the importance of the message, if one queue becomes full, messages of other types won't be blocked. It also makes it simpler to monitor if each message is being processed within its SLA by looking at the first message in the queue and seeing when it was added to determine how long it was waiting (see NServiceBus).
From your above metrics of 1GB in 5 minutes at 50,000 messages/minute I calculate each message to be about 4kb. This is quite large for a message since messages should normally only be carrying top level details about something happening, mostly IDs of what was changed, and the intent of what was changed. Larger data is better served from some other out-of-band channel for transferring large blobs (eg, file share, sftp, etc).
Also, since a service should encapsulate its own data, you shouldn't need to share much data between services. So large data within a service using messages to say what happened isn't unusual, large data between separate services using messages indicates that some boundaries are probably leaking.