I'm creating an Azure worker role that will subscribe to some events published by a service that I don't have any control over. Both the publisher and subscriber will be hosted in Azure.
The role will be part of a cloud service that is already using NServiceBus to publish and subscribe to events that it owns. To be consistent, I'm looking to use the same libraries in the new role.
The publisher of the event is not using NServiceBus. They are using the WindowsAzure.ServiceBus package to create BrokeredMessage objects and JSON.Net to serialize the message payloads. They are also using a 'topic per event type' pattern.
There are 3 topics I need my role to subscribe to, ItemActivated, ItemDeactived and ItemExpired. Each topic has a single event type published to it, and the event definition is available to me through a shared nuget package. The events are ItemActivatedEventV1, ItemDeactivedEventV1 etc...
Using the event definitions I can write message handlers, but I'm not convinced NServiceBus is the correct choice in the subscriber. Since the publisher isn't using NServiceBus, none of the message headers will be present - will this impact the behaviour of the subscriber?
The 'topic per event' is also different to the NServiceBus 'endpoint' approach, where multiple events sent to a single endpoint. Is it possible to configure NServiceBus to listen to multiple topics, e.g using MessageHandlerMappings in app.config?
Finally, the subscriber can be configured using JsonSerializer, but this isn't going to be guaranteed to be exactly the same at both ends of the communication. Is this going to impact behaviour?
Has anyone had any experience with this scenario? Any advice appreciated
Related
The situation is as follows. There are three services, one service is event sourced and publishes integration or notification events (outbox pattern) to the other two services (subscribers) using an event bus (like Azure Service bus or ActiveMQ).
This design is inspired by .NET microservices - Architecture e-book - Subscribing to events.
I'm wondering what should happen if one of these events can not be delivered due to an error or if event handeling simply wasn't implemented correctly.
Should I trust my message bus in case of an application error?
Is this a usecase for dead letter queues?
On republishing events, should all messages be republished to all topics or would it be possible to only republish a subset?
Should the service republishing events be able to access publisher and subscriber databases to know the message offset?
Or should the subscribing microservices be able to read the outbox?
Should I trust my message bus in case of an application error?
Yes.
(Edit: After reading this answer, read #StuartLC's answer for more info)
The system you described is an eventually consistent one. It works under the assumption that if each component does its job, all components will eventually converge on a consistent state.
The Outbox's job is to ensure that any event persisted by the Event Source Microservice is durably and reliably delivered to the message bus (via the Event Publisher). Once that happens, the Event Source and the Event Publisher are done--they can assume that the event will eventually be delivered to all subscribers. It is then the message bus's job to ensure that that happens.
The message bus and its subscriptions can be configured for either "at least once" or "at most once" delivery. (Note that "exactly once" delivery is generally not guaranteeable, so an application should be resilient against either duplicate or missed messages, depending on the subscription type).
An "at least once" (called "Peek Lock" by Azure Service Bus) subscription will hold on to the message until the subscriber gives confirmation that it was handled. If the subscriber gives confirmation, the message bus's job is done. If the subscriber responds with an error code or doesn't respond in a timely manner, the message bus may retry delivery. If delivery fails multiple times, the message may be sent to a poison message or dead-letter queue. Either way, the message bus holds on to the message until it gets confirmation that it was received.
On republishing events, should all messages be republished to all topics or would it be possible to only republish a subset?
I can't speak for all messaging systems, but I would expect a message bus to only republish to the subset of subscriptions that failed. Regardless, all subscribers should be prepared to handle duplicate and out-of-order messages.
Should the service republishing events be able to access publisher and subscriber databases to know the message offset?
I'm not sure I understand what you mean by "know the message offset", but as a general guideline, microservices should not share databases. A shared database schema is a contract. Once the contract established, it is difficult to change unless you have total control over all of its consumers (both their code and deployments). It's generally better to share data through application APIs to allow more flexibility.
Or should the subscribing microservices be able to read the outbox?
The point of the message bus is to decouple the message subscribers from the message publisher. Making the subscribers explicitly aware of the publisher defeats that purpose, and will likely be difficult to maintain as the number of publishers and subscribers grows. Instead, rely on a dedicated monitoring service and/or the monitoring capabilities of the message bus to track delivery failures.
Just to add to #xander's excellent answer, I believe that you may be using an inappropriate technology for your event bus. You should find that Azure Event Hubs or Apache Kafka are better candidates for event publish / subscribe architectures. Benefits of a dedicated Event Bus technology over the older Service Bus approaches include:
There is only ever one copy of each event message (whereas Azure Service Bus or RabbitMQ make deep copies of each message for each subscriber)
Messages are not deleted after consumption by any one subsriber. Instead, messages are left on the topic for a defined period of time (which can be indefinite, in Kafka's case).
Each subscriber (consumer group) will be able to track it's committed offset. This allows subscribers to re-connect and rewind if it has lost messages, independently of the publisher, and other subscribers (i.e. isolated).
New consumers can subscribe AFTER messages have been published, and will still be able to receive ALL messages available (i.e. rewind to the start of available events)
With this in mind, :
Should I trust my message bus in case of an application error?
Yes, for the reasons xander provided. Once the publisher has a confirmation that the event bus has accepted the event, the publisher's job is now done and should never send this same event again.
Nitpicky, but since you are in a publish subscribe architecture (i.e. 0..N subscribers), you should refer to the bus as an event bus (not a message bus), irrespective of the technology used.
Is this a usecase for dead letter queues?
Dead letter queues are more usually an artifact of point-to-point queues or service bus delivery architecture, i.e. where there is a command message intended (transactionally) for a single, or possibly finite number of recipients. In a pub-sub event bus topology, it would be unfair to the publisher to expect it to monitor the delivery of all subscribers.
Instead, the subscriber should take on responsibility for resilient delivery. In technologies like Azure Event Hubs and Apache Kafka, events are uniquely numbered per consumer group, so the subscriber can be alerted to a missed message through monitoring of message offsets.
On republishing events, should all messages be republished to all topics or would it be possible to only republish a subset?
No, an event publisher should never republish an event, as this will corrupt the chain of events to all observer subscribers. Remember, that there may be N subscribers to each published event, some of which may be external to your organisation / outside of your control. Events should be regarded as 'facts' which have happened at a point in time. The event publisher shouldn't care whether there are zero or 100 subscribers to an event. It is up to each subscriber to decide on how the event message should be interpreted.
e.g. Different types of subscribers could do any of the following with an event:
Simply log the event for analytics purposes
Translate the event into a command (or Actor Model message) and be handled as a transaction specific to the subscriber
Pass the event into a Rules engine to reason over the wider stream of events, e.g. trigger counter-fraud actions if a specific customer is performing an unusually large number of transactions
etc.
So you can see that republishing events for the benefit of one flakey subscriber would corrupt the data flow for other subscribers.
Should the service republishing events be able to access publisher and subscriber databases to know the message offset?
As xander said, Systems and Microservices shouldn't share databases. However, systems can expose APIs (RESTful, gRPC etc)
The Event Bus itself should track which subscriber has read up to which offset (i.e. per consumer group, per topic and per partition). Each subscriber will be able to monitor and change its offsets, e.g. in case an event was lost and needs to be re-processed. (Again, the producer should never republish an event once it has confirmation that the event has been received by the bus)
Or should the subscribing microservices be able to read the outbox?
There are at least two common approaches to event driven enterprise architectures:
'Minimal information' events, e.g. Customer Y has purchased Product Z. In this case, many of the subscribers will find the information contained in the event insufficient to complete downstream workflows, and will need to enrich the event data, typically by calling an API close to the publisher, in order to retrieve the rest of the data they require. This approach has security benefits (since the API can authenticate the request for more data), but can lead to high I/O load on the API.
'Deep graph' events, where each event message has all the information that any subscriber should ever hope to need (this is surprisingly difficult to future proof!). Although the event message sizes will be bloated, it does save a lot of triggered I/O as the subscribers shouldn't need to perform further enrichment from the the producer.
Is possible to use topic exchange as true event notification system?
I've created topic exchange on given exchange named as Cherry. I've got one publisher at routing key cherry.user.created and many consumers with same routing key, but when I publish an event only one of consumers consume an event. I thought that topic can be used as "real event broadcasting" - every consumer gets notified when given event happened, but right now only one consumer consume an event and other consumers do not know about created event...
To clarify my comment about queues. In rabbitmq, if multiple consumers use the same queue - message delivered to that queue is always dispatched in round-robin manner, no matter what. So when you subscribe to topic exchange, best way is to declare new queue for each consumer (with any name, or better random generated by rabbit itself) and use target routing key (cherry.user.created) to bind those queues to exchange.
I have started to work with micro-services and I need to create an event publishing mechanism.
I plan to use Amazon SQS.
The idea is quite simple. I store events in the database in the same transaction as aggregates.
If user would change his email, event UserChangedEmail will be stored in the database.
I also have event handler, such as UserChangedEmailHandler, which will (in this case) be responsible to publish this event to SQS queue, so other services can know that user changed email.
My question is, what is the practice to achieve this? Should I have some kind of background timed process which will scan events table and publish events to SQS?
Can this be process within WebApi application (preferable), or should this be a separate a process?
One of the ideas was to use Hangfire, but it does not support cron jobs under a minute.
Any suggestions?
EDIT:
As suggested in the one of the answers, I've looked in to NServicebus. One of the examples on the NServiceBus page shows core of my concern.
In their example, they create a log that order has been placed. What if log or database entry is successfully commited, but publish breaks and event never gets published?
Here's the code for the event handler:
public class PlaceOrderHandler :
IHandleMessages<PlaceOrder>
{
static ILog log = LogManager.GetLogger<PlaceOrderHandler>();
IBus bus;
public PlaceOrderHandler(IBus bus)
{
this.bus = bus;
}
public void Handle(PlaceOrder message)
{
log.Info($"Order for Product:{message.Product} placed with id: {message.Id}");
log.Info($"Publishing: OrderPlaced for Order Id: {message.Id}");
var orderPlaced = new OrderPlaced
{
OrderId = message.Id
};
bus.Publish(orderPlaced); <!-- my concern
}
}
Off the Shelf Suggestions
Rather than rolling your own, I recommend looking into off the shelf products, as there is a lot of complexity here that will not be apparent out the outset, e.g.
Managing event subscriber list - an SQS queue is more appropriately paired with an event consumer, rather than with an event producer as when a message is consumed it is no longer available on the queue - so if you want to support multiple subscribers for a given event (which is a massive benefit of event driven architectures), how do you know which SQS queues you push the event message onto when it is first raised?
Retry semantics, error forwarding queues - handling temporary errors due to ephemeral infrastructure issues vs permanent errors due to business logic semantic issues
Audit trails of which messages were raised when and sent where
Security of messages sent via SQS (does your business case require them to be encrypted? SQS is an application service offered by Amazon which doesn't provide storage level encryption
Size of messages - SQS has a message size limit so you may eventually need to handle out-of-band transmission of large messages
And that's just off the top of my head...
A few off the shelf systems that would assist:
NServiceBus provides a framework for managing command and event messaging, and it has a plugin framework permitting flexible transport types - NServiceBus.SQS offers SQS as a transport.
Offers comprehensive and flexible retry, audit and error handling
Opinionated use of commands vs events (command messages say "Do this" and are sent to a single service for processing, event messages say "Something happened" and are sent to an arbitrary number of flexible subscribers)
Outbox pattern provides transactionally consistent messaging even with non-transactionally consistent transports, such as SQS
Currently the SQS plugin uses default NServiceBus subscriber persistence, which requires an SQL Server for storing the event subscriber list (see below for an option that leverages SNS)
Built in support for sagas, offering a framework to ensure multi transaction eventual consistency with rollback via compensating actions
Timeouts supporting scheduled message handling
Commercial offering, so not free, but many plugins/extensions are open source
Mass Transit
Doesn't support SQS off the shelf, but does support Azure Service Bus and RabbitMq, so could be an alternative for you if that is an option
Similar offering to NServiceBus, but not 100% the same - NServiceBus vs MassTransit offers a comprehensive comparison
Fully open source/free
Just Saying
A light-weight open source messaging framework designed specifically for SQS/SNS based
SNS topic per event, SQS queue per microservice, use native SNS SQS Queue subcription to achieve fanout
Open Source Free
There may be others, and I've most personal experience with NServiceBus, but I strongly recommend looking into the off the shelf solutions - they will free you up to start designing your system in terms of business events, rather than worrying about the mechanics of event transmission.
Even if you do want to build your own as a learning exercise, reviewing how the above work may give you some tips on what's needed for reliable event driven messaging.
Transactional Consistency and the Outbox Pattern
The question has been edited to ask about the what happens if parts of the operation succeed, but the publish operation fails. I've seen this referred to as the transactional consistency of the messaging, and it generally means that within a transaction, all business side-effects are committed, or none. Business side effects may mean:
Database record updated
Another database record deleted
Message published to a message queue
Email sent
You generally don't want an email sent or a message published, if the database operation failed, and likewise, you don't want the database operation committed if the message publish failed.
So how to ensure consistency of messaging?
NServiceBus handles this in one of two ways:
Use a transactionally consistent message transport, such as MSMQ.
MSMQ is able to make use of Microsoft's DTC (Distributed Transaction Coordinator) and DTC can enroll the publishing of messages in a distributed transaction with SQL server updates - this means that if your business transaction fails, your publish operation will be rolled back and visa versa
The Outbox Pattern
With the outbox pattern, messages are not dispatched immediately - they are added to an Outbox table in a database, ideally the same database as your business data, as part of the same transaction
AFTER the transaction is committed, it attempts to dispatch each message, and only removes it from the outbox on successful dispatch
In the event of a failure of the system after dispatch but before delete, the message will be transmitted a second time. To compensate for this, when Outbox is enabled, NServiceBus will also do de-duplication of inbound messages, by maintaining a record of all inbound messages and discarding duplicates.
De-duplication is especially useful with Amazon SQS, as it is itself eventually consistent, and the same messages may be received twice.
This is the not far from the original concept in your question, but there are differences:
You were concepting a background timed process to scan the events table (aka Outbox table) and publish events to SQS
NServiceBus executes handlers within a pipeline - with Outbox, the dispatch of messages to the transport (aka pushing messages into an SQS queue) is simply one of the last steps in the pipeline. So - whenever a message is handled, any outbound messages generated during the handling will be dispatched immediately after the business transaction is committed - no need for a timed scan of the events table.
Note: Outbox is only successful when there is an ambient NServiceBus Handler transaction - i.e. when you are handling a message within the NServiceBus pipeline. This will NOT be the case in some contexts, e.g. a WebAPI Request pipeline. For this reason, NServiceBus recommends using your API request to send a single Command message only, and then combining business data operations with further messaging within a transactionally consistent command handler in a backend endpoint service. Although point 3 in their doc is more relevant to the MSMQ than SQS transport.
Handler Semantics
One more comment about your proposal - by convention, UserChangedEmailHandler would more commonly be associated with the service that does something in response to the email being changed, rather than simply participating in the propagation of the information that the email has changed. When you have 50 events being published by your system, do you want 50 different handlers just to push those messages onto different queues?
The systems above use a generic framework to propagate messages via the transport, so you can reserve UserChangedEmailHandler for the subscribing system and include in it the business logic that should happen whenever a user changes their email.
In any case I'd go with stateful services. If you want to go a tad hands off, have a look at Azure Service Fabric.
And as in my case, I had my own set of microservices, in a scenario like this I did the basic create operation on db first (Changing the email). I had an event entity and pushed back an event in that collection (in this case mongodb). A stateful service was polling the database and processing the events in batch.
Now in your case, if your web app process is persistent you can opt to enqueue the message right away and keep a field in the event that states whether it was actually processed later by any service or not. I used mongodb for database and Azure Service Bus as a message broker. I think Amazon SQS would be similiar.
Now, if your web app is a vanilla asp.net Web api or mvc process, you only should enlist the event in database and leave as in that way you dont have to create a mesasge broker listener every time you getting a request. One service can poll the db, use the message broker to let the other services know.
If you want a total event driven paradigm, you might need a look in Event Hubs
I strongly suggest keeping a tab on whether any resource has been processed or not from the Message Bus just to make sure it's reliable.
Hope it helps. :)
Domain Driven Design Passing Events to separate Bounded Contexts
A user action in MVC should generate an Event which is passed to a remote (same LAN) Event handler.
What I've tested:
MVC: fire and forget service call (asynchronous) ->
(IIS hosted) WCF which gathers data and populates a message ->
Sent via EasyNetQ/RabbitMQ ServiceBus ->
The event is consumed by a Subscriber (using a DI container initialized from a WCF service endpoint) which handles the event & it's data.
I did some testing to see how it works if the service is called fairly quickly by looping in the MVC side
for (int i = 0; i < 200; i++)
{
...
client.MyServiceMethod(someId, startDate);
...
}
The MessageQueue part is quick, based on the timestamps it is sent to the queue and received by the subscriber within the same second. Looping through the WCF service calls is very slow. It takes many seconds to loop through them. I tried switching from wsHttpBinding to netTcpBinding, and playing with the serviceThrottling in WCF.
WCF isn't compulsory, but it seems like a separate event handling project (on the publisher end) would be beneficial and could be physically located elsewhere from the MVC app (load reduction etc.). Is WCF plausible for a situation like this, or should I try using Windows services or some other self-hosted e.g. console app etc, or potentially using a thread in MVC to generate the event data, or are there better scenarios? What are the best practices in this type of Event handling system? Basically it seems like it would be beneficial to have something generating the Event data since it has to be handled somewhere while not slowing down the UI that the end user is using.
Instead of trying to roll your own infrastructure like this, I think you would do well to employ a tool like NServiceBus (not free) or MassTransit (free). (I would consider this best practice.)
I can't speak for MassTransit, but my experience with NServiceBus has been very good. You only need to specify which messages go to which queue. You can use several different queueing technologies, but I would recommend starting with the default MSMQ implementation. No WCF configuration nightmares necessary. ;)
All of your message handlers will also be automatically wrapped in a distributed transaction so that if a DB interaction fails, the entire message will be rolled back and you'll be able to try the message again in the future.
If I undertood well, your event creation process is "heavy" and you want to avoid to be created in the MVC process. I guess you are sending some information to the WCF service in order to let him prepare the event.
You could think of a 2 consumers scenario avoiding the WCF step:
Your MVC application creates and publish a "light" event with all data required in order to create the "heavy" event (basically with the input data you would pass to WCF)
An EventCreator subscriber consumes this message and prepares the heavy event
Your already existing consumer will then consume the heavy event
EasyNetQ already provides simple functions to publish and consume the message.
Most of the tutorials you find online suggest using TopShelf for hosting your consumers in a console application (debug) or windows service (production). EasyNetQ has an example here: EasyNetQ with TopShelf
If you want to "hide" the EasyNetQ dependecy on your MVC project, you could wrap the EasyNetQ IBus to a custom Bus and use an IoC container in order to inject a specific implementation of your bus. The example provided above uses Castle.Windsor as IoC container
Suppose, I want to scale out (add more boxes) some WCF service. This looks pretty easy, set up load balancer that calls WCF services on multiple boxes using for example round robin algorithm.
However how to deal with situation when a WCF service have callback contract. When a client connects to some particular box, it receives events only raised by this computer WCF service instance. And I want client to receive events that were raised by any WCF service instance in group (cluster).
What is the best way to make WCF service know about events raised by other WCF service instances?
Some ideas: Multicast, broadcast, WCF NetPeerTcpBinding, Single server that subscribes to all WCF services in cluster (acting as event aggregate).
UPDATE: I have managed to create test system, using NetPeerTCPBinding as a mechanism to share events across servers. I haven't made a benchmark yet, but I feel that WCF P2P is to heavy for this tusk, I'm gonna implement UDP broadcast based event sharing system.
I would implement this by setting up a MSMQ queue that each server can subscribe to, and when an event occurs that the other servers need to know about, the service can publish it.
I use a library called NServiceBus to make this entire process simple. NServiceBus is a full-featured library that uses MSMQ (among other transports) to create pub/sub messaging buses, which would exactly solve your problem. It is easy to use and has a fluent interface for configuration, subscription, and publishing.
I will come back and edit this post later with an example, but the NServiceBus website has plenty of documentation to get you started until then.
Have you considered messaging? Sounds ideal.