I have started to work with micro-services and I need to create an event publishing mechanism.
I plan to use Amazon SQS.
The idea is quite simple. I store events in the database in the same transaction as aggregates.
If user would change his email, event UserChangedEmail will be stored in the database.
I also have event handler, such as UserChangedEmailHandler, which will (in this case) be responsible to publish this event to SQS queue, so other services can know that user changed email.
My question is, what is the practice to achieve this? Should I have some kind of background timed process which will scan events table and publish events to SQS?
Can this be process within WebApi application (preferable), or should this be a separate a process?
One of the ideas was to use Hangfire, but it does not support cron jobs under a minute.
Any suggestions?
EDIT:
As suggested in the one of the answers, I've looked in to NServicebus. One of the examples on the NServiceBus page shows core of my concern.
In their example, they create a log that order has been placed. What if log or database entry is successfully commited, but publish breaks and event never gets published?
Here's the code for the event handler:
public class PlaceOrderHandler :
IHandleMessages<PlaceOrder>
{
static ILog log = LogManager.GetLogger<PlaceOrderHandler>();
IBus bus;
public PlaceOrderHandler(IBus bus)
{
this.bus = bus;
}
public void Handle(PlaceOrder message)
{
log.Info($"Order for Product:{message.Product} placed with id: {message.Id}");
log.Info($"Publishing: OrderPlaced for Order Id: {message.Id}");
var orderPlaced = new OrderPlaced
{
OrderId = message.Id
};
bus.Publish(orderPlaced); <!-- my concern
}
}
Off the Shelf Suggestions
Rather than rolling your own, I recommend looking into off the shelf products, as there is a lot of complexity here that will not be apparent out the outset, e.g.
Managing event subscriber list - an SQS queue is more appropriately paired with an event consumer, rather than with an event producer as when a message is consumed it is no longer available on the queue - so if you want to support multiple subscribers for a given event (which is a massive benefit of event driven architectures), how do you know which SQS queues you push the event message onto when it is first raised?
Retry semantics, error forwarding queues - handling temporary errors due to ephemeral infrastructure issues vs permanent errors due to business logic semantic issues
Audit trails of which messages were raised when and sent where
Security of messages sent via SQS (does your business case require them to be encrypted? SQS is an application service offered by Amazon which doesn't provide storage level encryption
Size of messages - SQS has a message size limit so you may eventually need to handle out-of-band transmission of large messages
And that's just off the top of my head...
A few off the shelf systems that would assist:
NServiceBus provides a framework for managing command and event messaging, and it has a plugin framework permitting flexible transport types - NServiceBus.SQS offers SQS as a transport.
Offers comprehensive and flexible retry, audit and error handling
Opinionated use of commands vs events (command messages say "Do this" and are sent to a single service for processing, event messages say "Something happened" and are sent to an arbitrary number of flexible subscribers)
Outbox pattern provides transactionally consistent messaging even with non-transactionally consistent transports, such as SQS
Currently the SQS plugin uses default NServiceBus subscriber persistence, which requires an SQL Server for storing the event subscriber list (see below for an option that leverages SNS)
Built in support for sagas, offering a framework to ensure multi transaction eventual consistency with rollback via compensating actions
Timeouts supporting scheduled message handling
Commercial offering, so not free, but many plugins/extensions are open source
Mass Transit
Doesn't support SQS off the shelf, but does support Azure Service Bus and RabbitMq, so could be an alternative for you if that is an option
Similar offering to NServiceBus, but not 100% the same - NServiceBus vs MassTransit offers a comprehensive comparison
Fully open source/free
Just Saying
A light-weight open source messaging framework designed specifically for SQS/SNS based
SNS topic per event, SQS queue per microservice, use native SNS SQS Queue subcription to achieve fanout
Open Source Free
There may be others, and I've most personal experience with NServiceBus, but I strongly recommend looking into the off the shelf solutions - they will free you up to start designing your system in terms of business events, rather than worrying about the mechanics of event transmission.
Even if you do want to build your own as a learning exercise, reviewing how the above work may give you some tips on what's needed for reliable event driven messaging.
Transactional Consistency and the Outbox Pattern
The question has been edited to ask about the what happens if parts of the operation succeed, but the publish operation fails. I've seen this referred to as the transactional consistency of the messaging, and it generally means that within a transaction, all business side-effects are committed, or none. Business side effects may mean:
Database record updated
Another database record deleted
Message published to a message queue
Email sent
You generally don't want an email sent or a message published, if the database operation failed, and likewise, you don't want the database operation committed if the message publish failed.
So how to ensure consistency of messaging?
NServiceBus handles this in one of two ways:
Use a transactionally consistent message transport, such as MSMQ.
MSMQ is able to make use of Microsoft's DTC (Distributed Transaction Coordinator) and DTC can enroll the publishing of messages in a distributed transaction with SQL server updates - this means that if your business transaction fails, your publish operation will be rolled back and visa versa
The Outbox Pattern
With the outbox pattern, messages are not dispatched immediately - they are added to an Outbox table in a database, ideally the same database as your business data, as part of the same transaction
AFTER the transaction is committed, it attempts to dispatch each message, and only removes it from the outbox on successful dispatch
In the event of a failure of the system after dispatch but before delete, the message will be transmitted a second time. To compensate for this, when Outbox is enabled, NServiceBus will also do de-duplication of inbound messages, by maintaining a record of all inbound messages and discarding duplicates.
De-duplication is especially useful with Amazon SQS, as it is itself eventually consistent, and the same messages may be received twice.
This is the not far from the original concept in your question, but there are differences:
You were concepting a background timed process to scan the events table (aka Outbox table) and publish events to SQS
NServiceBus executes handlers within a pipeline - with Outbox, the dispatch of messages to the transport (aka pushing messages into an SQS queue) is simply one of the last steps in the pipeline. So - whenever a message is handled, any outbound messages generated during the handling will be dispatched immediately after the business transaction is committed - no need for a timed scan of the events table.
Note: Outbox is only successful when there is an ambient NServiceBus Handler transaction - i.e. when you are handling a message within the NServiceBus pipeline. This will NOT be the case in some contexts, e.g. a WebAPI Request pipeline. For this reason, NServiceBus recommends using your API request to send a single Command message only, and then combining business data operations with further messaging within a transactionally consistent command handler in a backend endpoint service. Although point 3 in their doc is more relevant to the MSMQ than SQS transport.
Handler Semantics
One more comment about your proposal - by convention, UserChangedEmailHandler would more commonly be associated with the service that does something in response to the email being changed, rather than simply participating in the propagation of the information that the email has changed. When you have 50 events being published by your system, do you want 50 different handlers just to push those messages onto different queues?
The systems above use a generic framework to propagate messages via the transport, so you can reserve UserChangedEmailHandler for the subscribing system and include in it the business logic that should happen whenever a user changes their email.
In any case I'd go with stateful services. If you want to go a tad hands off, have a look at Azure Service Fabric.
And as in my case, I had my own set of microservices, in a scenario like this I did the basic create operation on db first (Changing the email). I had an event entity and pushed back an event in that collection (in this case mongodb). A stateful service was polling the database and processing the events in batch.
Now in your case, if your web app process is persistent you can opt to enqueue the message right away and keep a field in the event that states whether it was actually processed later by any service or not. I used mongodb for database and Azure Service Bus as a message broker. I think Amazon SQS would be similiar.
Now, if your web app is a vanilla asp.net Web api or mvc process, you only should enlist the event in database and leave as in that way you dont have to create a mesasge broker listener every time you getting a request. One service can poll the db, use the message broker to let the other services know.
If you want a total event driven paradigm, you might need a look in Event Hubs
I strongly suggest keeping a tab on whether any resource has been processed or not from the Message Bus just to make sure it's reliable.
Hope it helps. :)
Related
Within code I need to atomically:
//write to a database table
//publish EventA to service bus
//write to another database table
//publish EventB to service bus
If either of the db writes fails, I can easily roll everything back within a transaction scope. However if that happens, it's essential the events are never published to service bus.
Ideally I need the service bus library to 'wait until the transaction scope successfully completes' before publishing the message onto the bus.
I'm working with legacy .net framework code - I could easily write something which holds events to raise in memory, and only raise these once the scope completes. The problem even then is that if EventB fails to publish, EventA has already been published.
What's the easiest way to include service bus events within a transaction scope?
I don't think there's an easy way to address this. Azure Service Bus will not share a transaction with any other service. Just as database transactions cannot include a web service call. The way to go about it would always require some additional complexity compared to a simple DTC transaction that was possible on-premises (to an extent).
But it can be done. With the patterns such as Outbox (unit of work) and Inbox (idempotency) are described very well in this post
Outbox Pattern - This pattern ensures that a message was sent (e.g. to a queue) successfully at least once. With this pattern, instead of directly publishing a message to the queue, we store it in the temporary storage (e.g. database table). We’re wrapping the entity save and message storing with the Unit of Work (transaction). By that, we’re making sure that if the application data was stored, the message wouldn’t be lost. It will be published later by a background process. This process will check if there are any not sent events in the table. When the worker finds such messages, it tries to send them. After it gets confirmation of publishing (e.g. ACK from the queue) it marks the event as sent.
Inbox Pattern - This is a pattern similar to Outbox Pattern. It’s used to handle incoming messages (e.g. from a queue). Accordingly, we have a table in which we’re storing incoming events. Contrary to outbox pattern, we first save the event in the database, then we’re returning ACK to queue. If save succeeded, but we didn’t return ACK to queue, then delivery will be retried. That’s why we have at-least-once delivery again. After that, an outbox-like process runs. It calls message handlers that perform business logic.
To answer your specific question.
What's the easiest way to include service bus events within a transaction scope?
The simplest would be to use a library that does that already, has been tested thoroughly, and can integrate into your system. MassTransit, NServiceBus, Jasper, etc. Alternatively, build your own. I'd not advise unless it's a pet project or the core of your system.
The situation is as follows. There are three services, one service is event sourced and publishes integration or notification events (outbox pattern) to the other two services (subscribers) using an event bus (like Azure Service bus or ActiveMQ).
This design is inspired by .NET microservices - Architecture e-book - Subscribing to events.
I'm wondering what should happen if one of these events can not be delivered due to an error or if event handeling simply wasn't implemented correctly.
Should I trust my message bus in case of an application error?
Is this a usecase for dead letter queues?
On republishing events, should all messages be republished to all topics or would it be possible to only republish a subset?
Should the service republishing events be able to access publisher and subscriber databases to know the message offset?
Or should the subscribing microservices be able to read the outbox?
Should I trust my message bus in case of an application error?
Yes.
(Edit: After reading this answer, read #StuartLC's answer for more info)
The system you described is an eventually consistent one. It works under the assumption that if each component does its job, all components will eventually converge on a consistent state.
The Outbox's job is to ensure that any event persisted by the Event Source Microservice is durably and reliably delivered to the message bus (via the Event Publisher). Once that happens, the Event Source and the Event Publisher are done--they can assume that the event will eventually be delivered to all subscribers. It is then the message bus's job to ensure that that happens.
The message bus and its subscriptions can be configured for either "at least once" or "at most once" delivery. (Note that "exactly once" delivery is generally not guaranteeable, so an application should be resilient against either duplicate or missed messages, depending on the subscription type).
An "at least once" (called "Peek Lock" by Azure Service Bus) subscription will hold on to the message until the subscriber gives confirmation that it was handled. If the subscriber gives confirmation, the message bus's job is done. If the subscriber responds with an error code or doesn't respond in a timely manner, the message bus may retry delivery. If delivery fails multiple times, the message may be sent to a poison message or dead-letter queue. Either way, the message bus holds on to the message until it gets confirmation that it was received.
On republishing events, should all messages be republished to all topics or would it be possible to only republish a subset?
I can't speak for all messaging systems, but I would expect a message bus to only republish to the subset of subscriptions that failed. Regardless, all subscribers should be prepared to handle duplicate and out-of-order messages.
Should the service republishing events be able to access publisher and subscriber databases to know the message offset?
I'm not sure I understand what you mean by "know the message offset", but as a general guideline, microservices should not share databases. A shared database schema is a contract. Once the contract established, it is difficult to change unless you have total control over all of its consumers (both their code and deployments). It's generally better to share data through application APIs to allow more flexibility.
Or should the subscribing microservices be able to read the outbox?
The point of the message bus is to decouple the message subscribers from the message publisher. Making the subscribers explicitly aware of the publisher defeats that purpose, and will likely be difficult to maintain as the number of publishers and subscribers grows. Instead, rely on a dedicated monitoring service and/or the monitoring capabilities of the message bus to track delivery failures.
Just to add to #xander's excellent answer, I believe that you may be using an inappropriate technology for your event bus. You should find that Azure Event Hubs or Apache Kafka are better candidates for event publish / subscribe architectures. Benefits of a dedicated Event Bus technology over the older Service Bus approaches include:
There is only ever one copy of each event message (whereas Azure Service Bus or RabbitMQ make deep copies of each message for each subscriber)
Messages are not deleted after consumption by any one subsriber. Instead, messages are left on the topic for a defined period of time (which can be indefinite, in Kafka's case).
Each subscriber (consumer group) will be able to track it's committed offset. This allows subscribers to re-connect and rewind if it has lost messages, independently of the publisher, and other subscribers (i.e. isolated).
New consumers can subscribe AFTER messages have been published, and will still be able to receive ALL messages available (i.e. rewind to the start of available events)
With this in mind, :
Should I trust my message bus in case of an application error?
Yes, for the reasons xander provided. Once the publisher has a confirmation that the event bus has accepted the event, the publisher's job is now done and should never send this same event again.
Nitpicky, but since you are in a publish subscribe architecture (i.e. 0..N subscribers), you should refer to the bus as an event bus (not a message bus), irrespective of the technology used.
Is this a usecase for dead letter queues?
Dead letter queues are more usually an artifact of point-to-point queues or service bus delivery architecture, i.e. where there is a command message intended (transactionally) for a single, or possibly finite number of recipients. In a pub-sub event bus topology, it would be unfair to the publisher to expect it to monitor the delivery of all subscribers.
Instead, the subscriber should take on responsibility for resilient delivery. In technologies like Azure Event Hubs and Apache Kafka, events are uniquely numbered per consumer group, so the subscriber can be alerted to a missed message through monitoring of message offsets.
On republishing events, should all messages be republished to all topics or would it be possible to only republish a subset?
No, an event publisher should never republish an event, as this will corrupt the chain of events to all observer subscribers. Remember, that there may be N subscribers to each published event, some of which may be external to your organisation / outside of your control. Events should be regarded as 'facts' which have happened at a point in time. The event publisher shouldn't care whether there are zero or 100 subscribers to an event. It is up to each subscriber to decide on how the event message should be interpreted.
e.g. Different types of subscribers could do any of the following with an event:
Simply log the event for analytics purposes
Translate the event into a command (or Actor Model message) and be handled as a transaction specific to the subscriber
Pass the event into a Rules engine to reason over the wider stream of events, e.g. trigger counter-fraud actions if a specific customer is performing an unusually large number of transactions
etc.
So you can see that republishing events for the benefit of one flakey subscriber would corrupt the data flow for other subscribers.
Should the service republishing events be able to access publisher and subscriber databases to know the message offset?
As xander said, Systems and Microservices shouldn't share databases. However, systems can expose APIs (RESTful, gRPC etc)
The Event Bus itself should track which subscriber has read up to which offset (i.e. per consumer group, per topic and per partition). Each subscriber will be able to monitor and change its offsets, e.g. in case an event was lost and needs to be re-processed. (Again, the producer should never republish an event once it has confirmation that the event has been received by the bus)
Or should the subscribing microservices be able to read the outbox?
There are at least two common approaches to event driven enterprise architectures:
'Minimal information' events, e.g. Customer Y has purchased Product Z. In this case, many of the subscribers will find the information contained in the event insufficient to complete downstream workflows, and will need to enrich the event data, typically by calling an API close to the publisher, in order to retrieve the rest of the data they require. This approach has security benefits (since the API can authenticate the request for more data), but can lead to high I/O load on the API.
'Deep graph' events, where each event message has all the information that any subscriber should ever hope to need (this is surprisingly difficult to future proof!). Although the event message sizes will be bloated, it does save a lot of triggered I/O as the subscribers shouldn't need to perform further enrichment from the the producer.
I was reading documentation few times and still its not clear for me how message pipeline looks like when error occur based on transaction level.
Diagram above present pipeline with three handlers which first send command to second and third one subscribe event from second one. When handler 2 processed business logic and start publish event error occur. What will happen based on transport transaction level ? My assumptions are listed below.
Transaction scope level
Bus rollback transaction. All process start with handler 1 based on recoverability plan (immediate retries and delayed retries). If recoverability plan finish with failure rollback happens and message is moved to error queue. Message can by retried from e.g. service pulse which start pipeline from handler 1 with steps describe earlier.
Transport transaction - Sends atomic with Receive
Process start from handler 2 based on recoverability plan. If recoverability plan finish with failure message is moved to error queue. Message can by retried from e.g. service pulse which start pipeline from handler 2 with steps describe earlier.
It very much depends on your transport and having DTC as described in the article.
If you are using DTC your assumptions are correct in 1 and 2, so it will work with MSMQ or SQL server transports using DTC.
(By the way, if you feel you can improve the article to make it clearer you can submit a pull request)
HTH
I'm not really sure what your scenario is. Based on Sean's answer and your additional question, I'll also try to answer it.
A message is usually send to an endpoint. Each endpoint has a single incoming queue. The message is dispatched to one or more handlers. Usually this is only one.
With distributed transactions (Usually MSDTC in Windows) it depends on which resources you are using that should roll back on an error. MSMQ and SQL Server support MSDTC so that should theoretically work. On an error everything will be rolled back, both the received messages, the SQL transactions and the outgoing messages. You will have a clean state.
SMTP doesn't support transactions, so if you send an email and the transaction rolls back, the email will be sent anyway. So if you retry the message, the email will be send again.
AtomicSendsWithReceive means the transport only participates in the transaction. This means that receiving and sending messages will roll back on an error. But anything done in SQL Server (or any other resource) will not be rolled back.
This is set-up within every endpoint and applies for every incoming message. Since sending of messages via a queue is completely asynchronous, it doesn't matter if you send messages between different endpoints or send every follow-up message to the same endpoint.
Inside NServiceBus there's a pipeline which processes messages. It verifies which transaction to use, which handler(s) to execute, etc. If you're talking about this, there's no way a handler can 'subscribe' to another handler.
If you're talking about message flow, where one handler sends or publishes a new message, then all what I wrote above applies.
I have a WCF service (the fact that it's WCF shouldn't matter) and I'm not looking for message queuing, but instead for an asynchronous work queue in which to place tasks, once a request / message is received. Requirements:
Must support persistent store that enables recovery of tasks in the case of Server / service process failure.
Supports re-running of failed jobs, up to a given limit (i.e. try re-running a job up to 5 times)
Able to record the failed job call along with its parameters, in an easily queried fashion. For example, I would query the store for failed jobs and receive a list of "job name, parameters".
Unfortunately cannot be a cloud-based / hosted solution.
Queues that I'm probably not looking for:
MSMQ (RabbitMQ, AMQP). Low level, and is focused on message transport.
Quartz.NET. Has some of the above but its error-recording facilities are lacking. Geared more toward cron-like scheduling than async work and error reporting.
the Default Task Scheduler of .NET TPL. It has no persistence of the process owning it stops abruptly and doesn't support re-running of tasks very well.
I think I'd be looking for something more along the lines of Celery, Resque, or even qless. I know Resque.NET exists (https://www.nuget.org/packages/Resque/), but not sure if there's something more mainstream, or if that could suffice.
What about Amazon SQS? You don't have to worry about infrastructure as you would with RabbitMQ/MSMQ. SQS is dirt cheap, too. Last time I checked, it was $0.01 per 10,000 messages. Why re-invent the wheel? Let Amazon (or other cloud providers with similar services, like Microsoft and Rackspace) do all the worrying.
I use Amazon SQS in production for all message-based services. Some of these messages act like chron jobs; an external process queues the message at a specific time. Some of them are acted upon immediately.
I need to implement a queuing mechanism for WCF service requests. The service will be called by clients in a one-way manner. These request messages should be stored in a SQL Server database and a Windows Service queues the messages. The time at which the requests are processed will be configurable. If there happens error in processing the message, it need to be retried up to 100 times and if still fails it need to be terminated.
Also there should be a mechanism to monitor the number of transaction made on a day and number of failures.
QUESTIONS
If I were using MSMQ, clients could have forwarded the message to queue without knowing the service endpoint. But I am using SQL Server to store the request messages. How the clients can put the requests to SQL Server?
Is the solution feasible? Do we have any article/book that explains how to implement the above?
What are the steps to prevent service and client reaching faulted state in this scenario?
What is the best method to store incoming message to database?
What is the best method to implement retry mechanism? Anything already exist so that I don't have to reinvent the wheel?
Is there any book/article that explains this implementation?
NOTES
Content of the message will be complex XML. For example Travel expense items of an employee or a list of employees.
READING
Logging WCF Request to Database
Guaranteed processing of data in WCF service
MSMQ vs. SQL Server Service Broker
Is it possible to persist and then forward WCF messages to destination services?
WCF 4 Routing Service - protocol bridging issue
https://softwareengineering.stackexchange.com/questions/134605/designing-a-scalable-and-robust-retry-mechanism
Integrating SQL Service Broker and NServiceBus
Can a subscriber also publish/send message in NServiceBus?
I'm a DBA, so that flavors my my response, but here's what I'd do:
If you're using SQL 2005+, use Service Broker to store the messages
in the database rather than storing them in a table. You get a
queueing mechanism with this, so you can get rid of MSMQ. You'll also have a table, but it's just going to store the conversation handle (essentially, a pointer to the message) along with how many times it attempted this message. Lastly, you'll want some sort of a "dead letter box" where messages that reach your retry threshold go.
In your message processing code, do the following:
Begin a transaction
Receive a message off of the queue
If the retry count is greater than the threshold, move it to the dead letter box and commit
Increment the counter on the table for this message
Process the message
If the processing succeeded, commit the transaction
If the processing failed, put a new message on the queue with the same contents and then commit the transaction
Notice that there aren't any planned rollbacks. Rollbacks in Service Broker can be bad; if you rollback 5 times without a successful receive, the queue will become disabled for both enqueuing and dequeuing. But you still want to have transactions for the case when your message processor dies in the middle of processing (i.e. the server crashes).
1. If I were using MSMQ, clients could have forwarded the message to queue without knowing the service endpoint.
Yes - but they would need to know the MSMQ endpoint in order to send their message to the queue.....
But I am using SQL Server to store the request messages. How the clients can put the requests to SQL Server?
The clients won't put their requests into SQL Server - that's what the service on the server will do. The client just call a service method, and the code in there will store the request into the SQL Server table.
2. Is the solution feasible? Do we have any article/book that explains how to implement the above?
Sure, I don't see any big issue. The only point unclear to me right now is: how will the clients know their results?? Do they need to go get results from another service or something??
3. What are the steps to prevent service and client reaching faulted state in this scenario?
As always - just make sure your service code catches all exceptions and either handles them internally, or returns interoperable SOAP faults instead of .NET exceptions.
It sounds like what you want to do is similar to this:
In this case you can use netMsmqBinding between your service and your service consumers.
The only thing you won't get out of the box is the retrying. However if you make the queue transactional then this functionality can be implemented in your service code.
If there is a failure in your dequeue operation the message will not be removed from the queue. It will therefore be available for further dequeue attempts.
However, you would need to implement retry attempt threshold code which fails a message after a certain number of attempts.
I would suggest a different approach to the ones suggested here. If you are able to, I would consider the introduction of a messaging framework such as NServiceBus. It satifies many of the requirements that you have right out of the box. Let me try and address this in context of your requirements.
The service will be called by clients in a one-way manner.
All communication between endpoints in NServiceBus is one way. The underlying transport NServiceBus uses is MSMQ, so much like your WCF approach, your client is communicating with queues, rather than specific service endpoints.
These request messages should be stored in a SQL Server database and a Windows Service queues the messages.
If you wanted to store your request messages in a database then you can configure NServiceBus to forward all messages sent to your request processing endpoint to another "audit" queue, which you can use to persist to the database. This has the added benefit of separating your application logic from your auditing implementation.
The time at which the requests are processed will be configurable.
NServiceBus allows you to defer when a mesage is sent. Normally a message is sent via the Send method of a Bus instance - Bus.Send(msg). You can use The Defer method to send the message some time in the future eg. Bus.Defer(DateTime.Now.AddDays(1), msg); There's nothing more you really have to do, NserviceBus will handle the message once the specified time has been reached.
If there happens error in processing the message, it need to be retried up to 100 times and if still fails it need to be terminated.
By default, NServiceBus will enlist your message in a transaction as soon as your message leaves the queue. This ensures that in the event of failure that the message is rolled back to the originating queue. In such an event, NServiceBus will automatically try to reprocess the message a configurable number of times. The default being 5. You can of course set this to whatever you want, although I am not sure why you would want to set this to 100. At any rate, NServiceBus uses this setting to stop an endless loop of automatic retries. Once the limit has been reached the message is sent to an error queue where it sits until you fix whatever issues caused the exception or until you decide to push the message back to the queue for processing. Either way, you are assured that the message is never lost.
Also there should be a mechanism to monitor the number of transaction made on a day and number of failures.
The beauty of using MSMQ as the transport is that performance monitoring can be a achieved at a infrastructure level. How your applications perform, can be measured by how long they sit in the queue. NServiceBus comes with performance monitors that track the length of time a message is in the queue and you can also add perf mons that come built into windows to track other activity. To monitor errors, all you need to do is check the number of messages in the error queue.
One of the main features of NServiceBus is reliability. WCF will only do so much for you, and then you are on your own. That's a lot of code, complexity and frankly hugely error prone. The things I have described here are all standard features of NServiceBus and I have barely scratched the surface with all the other things that you can do with it. I recommend you check it out.