Concurrency Issues with Domain Events

Concurrency Issues with Domain Events - c#

I have a number of domain events that can be dispatched in our enterprise system. For example if someone creates or removes an address.
Should i be passing the entire entity as part of the event or should i just pass the ID. The messages are sent via service bus and consumed in parallel.
If i just send the ID then the entity may not be available on the consumer side, if a delete happening in the meanwhile. I can always just use an active flag and set that to false but what about if the entity was updated meanwhile and it changed something important.
How would i go about handling these cases?

This I believe to be a common dilemma on a service bus, and I believe there is no one perfect solution. I'm assuming the scope here is JUST the Events raised when important domain objects change states (i.e. not Transactional Commands, nor Read / Data Services)
The decision on sending just Event metadata, versus sending a full Reference Message (e.g. a new Customer Aggregate Root) probably has wider implications than just on concurrency / versioning issues relating to latency, e.g. some pros and cons of either approach:
The minimal Event metadata:
Has a much smaller payload (especially useful if you audit all messages on the bus)
Fits nicely into standard envelopes
Is reasonably secure if a delivered to an unauthorised bus endpoint (all the system gets is the knowledge that Customer XYZ has changed, not the actual details).
Whereas a full "aggregate" root Message reference update
Can be overkill if most subscribers aren't interested in the full payload.
Potential security concerns - not all subscribers on the bus may be entitled to the full payload
But is great for replenishing CQRS readstore caches, as endpoints don't need to go back to the source of truth to fetch data once they know their data is out of date - the data has already been provided.
So I guess the final decision will go with what you primarily intend doing with your EDA Events (Keeping CQRS caches updated vs Triggering BPM Workflows vs Monitoring CEP Rules etc). You might decide to go with a hybrid e.g. broadcast Event Data widely, but then route full Messages to only trusted endpoints (The event meta data can likely be projected from the full payload, so the Originating / Source of Truth system can just send one message payload to the bus after each state change).
To answer your data consistency question, I believe you will need to accept that the data will only be Eventually Consistent, and that latencies will cause temporary inconsistencies across the enterprise. I believe the best pattern here is to add a hash or timestamp to each Message obtained from the originating Source of Truth, which needs to be added to any Commands which have used this version of the data as an assumption.
Then, when the command handling system processes the command, it can then check this hash against the current 'true' version (based on the actual line of business system database, NOT against a readstore Cache), and will need to fail the command if the hash / timestamps do not match up - i.e. the optimistic concurrency pattern.

Related

How to ensure the order of messages in CQRS pattern

I juggled a bit with Greg Young's sample application and stumbled upon the problem in a multi-threadded environment that the order of messages in a bus might not be guaranteed or the processing of an event might not be completed before the next arrives.
Because of this, the ItemCreated message might occur after the ItemChangedSomething message, or at least the first message is not processed completely. This leads to problems in the "read side", because I want to update data that is not (yet) available.
How to go around this? (Assuming CQRS fits for the Domain Design case.)
Do I have to create a Saga or is there some other way of doing this?

You should choose a messaging infrastructure that guarantees delivery of events in-order on a per-consumer basis, even if multiple threads are delivering in parallel to different consumers. I.e., if you feed the events in order on the sending side, consumers will receive them in-order.
Then there are two basic approaches to handle this situation:
Infrastructure: in a small CQRS application without distributed data storage, you can record a global and increasing unique id for each event. Then make sure events are delivered by the messaging architecture in order of their id. That will completely eliminate out-of-order event delivery. Similarly you can record the time stamp of events and deliver them in order of their time stamp. While this might cause race conditions for some cases, for most applications and use cases time stamp based ordering is sufficient (in particular, if ItemCreated and ItemChanged are based on human actions).
State machines: For larger (typically distributed) setups, you can use an explicit or implicit automata/state machine model to cope for out-of-order arrival of messages. With a proper messaging infrastructure, you'll never receive ItemCreated and ItemChanged out of order if they originate from the same stream, but it might happen that events from two different sources (streams/aggregate roots) are consumed by some projection or saga in arbitrary order. Since these events are independent, there usually is a way (think state machine) to keep the projections in a valid state for either order.

In Service Oriented Architecture (SOA), should each service own its own data?

Under Service Oriented Architecture (SOA), I am interested in the question of whether a service should own its own data or not.
One of the constraints is that if anything fails at any point, we need to be able to roll the state of the entire system back to a prior state so we can retry or resume an operation.
If each service owns its own data, then does this imply that the system deals with change better from the programmers point of view?
However, if each service owns its own data, are there any mechanisms to roll the entire system back to a prior state so a failed operation can be resumed or retried?

It sounds like the granularity of what you call services might be wrong. A single service can have multiple endpoints (using same or different protocols) and if a message received on one endpoint requires rolling back state that was received on another it is still an internal transaction within the boundary of the service.
If we consider the simplistic example of order and customer services. The order services may have contracts with messages relating to the whole order or to an order line and cancelling the order will undo state that was affected by both. Usually the address change in the customer service would not be rolled back with that.
Sometimes service actions are tied together in a longer business process, to continue on the example above let's also add an invoicing service. so when we cancel an order we also want to cancel the invoice. However it is important to note that business rules within the realm of the invoicing service can behave differently, for instance and not "roll back" e.g. canceling an order late may require cancelation fees. This sort of long running interaction is what I call a saga (you can see a draft of that pattern here)
Also note that distributed transactions between services is usually not a good idea for several reasons (like holding locks for an external party you don't necessarily trust) you can read more about that here

The problem you raised here is (partially) solved by the two-phase commit protocol (see wikipedia article)
To avoid implementing this complex algorithm, you can dedicate one of the service of the architecture to data management. If you need data synchronization between different databases, try to do it on the lowest layer (ie system or DBMS).

SOA system defines more services within one system. This can provide more autonomous services in order to every service can be hosted on different machine.
But it does not mean that you can not provide unified persistent layer for all (domain) models which can point into one storage => simple business transaction when the whole system is spread into more computers or transaction for one system.
Autonomous domain model is useful besides other things during refactoring to avoid situation where a change in one model causes a change in another service => global changes in the whole application.

In short: No. Services don't "own" data.
Data are truths about the world, and implicitly durable and shared. Logical services (API) don't always map to real-world data in a 1-1 way. Physical services (code) are implementations that are very refactorable, which opposes the durable nature of data.
When you partition the data, you lose descriptive power and analytic insight. But where it really kills you is integrity. Data cannot be kept coherent across silos as you scale. For complex data, you need those foreign keys.
Put another way: a platform only has one "logical" DB (per environment), because there is only one universe. There are many valid reasons to break up a DB, such as HW limits, performance, coordination, replication, and compliance. But treat them as needed evils, used only when needed.
But I think you may be asking a different question: "should a long-running, data-based transaction be managed by a single authoritative service?" And typically, that answer is: Yes. That transaction service can implement the multiple steps to sequence the flow as it sees fit, such as 2-phase commit. All your other services should use that transaction service to execute the transaction.
BUT! That transaction service must interact with the DB as a shared resource using only atomic semantics. That includes all the transaction states (intent, then action, then result) so that recovery and rollbacks are possible. The database must be empowered to maintain integrity in the event of faults. I cannot stress this enough: everything, always must decompose into atomic DB operations if you want fault tolerance.

joliver / EventStore eventual consistency

I'm trying to figure out how my event storage and my read model are related in terms of actual nuts and bolts implementations.
My limited understanding of the event store leads me to believe:
Event is committed to event store
Dispatcher runs
If I'm using a queue, I send the message to a queue (lets say mass transit)
My read model is subscribed to the queue, so my read database gets the message (mysql)
My read model is updated with the new change to my data
This would mean that if anything happened to mass transit, my read database will be out of sync and I have to figure out how to sync it back.
Some stuff I've read/watched that's been published by greg young suggest using the event store itself as a queue, and maintain consistency by keeping an auto increment number on the event store side in order to maintain eventual consistency. I'm wondering if that is implemented in joliver's project?

so my read database gets the message (mysql)
I'd re-state that as "my event processor(s) for a given event get the message and (in my case) will typically manipulate state in a mysql database" (Or do you mean something else?).
This would mean that if anything happened to mass transit, my read database will be out of sync and I have to figure out how to sync it back.
Yes, your queue becomes part of the state of your app and it needs to be backed up and resilient. Note that the Dispatcher does not mark the Commit dispatched until it has successfully put it onto the Queue, and the queuing system won't remove the message until you've confirmed completion of the processing to do the necessary updates to sync the state in your Read Model.
Remember that you can consider multiple web service calls to all be part of the necessary work to process an event.
The other thing to bear in mind is that you'll want to have your event processors be idempotent (i.e. be able to handle At Least Once delivery).
Further down the line, you'll have fun considering what you're going to do if an event cannot complete processing - are you going to Dead Letter the message? Who is going to monitor that?
BTW depending on your hosting arrangements, Azure (or the on-premise Windows) ServiceBus might be worth considering)
Some stuff I've read/watched that's been published by greg young suggest using the event store itself as a queue, and maintain consistency by keeping an auto increment number on the event store side in order to maintain eventual consistency. I'm wondering if that is implemented in joliver's project?
No, JOES provides you a Dispatcher hook and you decide what's right for you after that. This is good and bad. There are systems that don't have a Dispatcher tied to a stateful Read Model at all - they simply query the Event Store for events and build an in-memory Read Model to short circuit all this.
Not sure what you mean by auto increment numbers.
Beware that the Projection stuff in the GES is not fully 1.0 yet (but it goes without saying its extremely deserving of your strong consideration - it intrinsically deals with the bulk of the concerns you're touching on with these questions)

Consuming SQL Server data events for messaging purposes

At our organization we have a SQL Server 2005 database and a fair number of database clients: web sites (php, zope, asp.net), rich clients (legacy fox pro). Now we need to pass certain events from the core database with other systems (MongoDb, LDAP and others). Messaging paradigm seems pretty capable of solving this kind of problem. So we decided to use RabbitMQ broker as a middleware.
The problem of consuming events from the database at first seemed to have only two possible solutions:
Poll the database for outgoing messages and pass them to a message broker.
Use triggers on certain tables to pass messages to a broker on the same machine.
I disliked the first idea due to latency issues which arise when periodical execution of sql is involved.
But event-based trigger approach has a problem which seems unsolvable to me at the moment. Consider this scenario:
A row is inserted into a table.
Trigger fires and sends a message (using a CLR Stored Procedure written in C#)
Everything is ok unless transaction which writes data is rolled back. In this case data will be consistent, but the message has already been sent and cannot be rolled back because trigger fires at the moment of writing to the database log, not at the time of transaction commit (which is a correct behaviour of a RDBMS).
I realize now that I'm asking too much of triggers and they are not suitable for tasks other than working with data.
So my questions are:
Has anyone managed to extract data events using triggers?
What other methods of consuming data events can you advise?
Is Query Notification (built on top of Service Broker) suitable in my situation?
Thanks in advance!

Lest first cut out of the of the equation the obvious misfit: Query Notification is not right technology for this, because is designed to address cache invalidation of relatively stable data. With QN you'll only know that table has changed, but you won't be able to know what had changed.
Kudos to you for figuring out why triggers invoking SQLCRL won't work: the consistency is broken on rollback.
So what does work? Consider this: BizTalk Server. In other words, there is an entire business built around this problem space, and solutions are far from trivial (otherwise nobody would buy such products).
You can get quite far though following a few principles:
decoupling. Event based triggers are OK, but do not send the message from the trigger. Aside from the consistency issue on rollback you also have the latency issue of having every DML operation now wait for an external API call (the RabbitMQ send) and the availability issue of the external API call failure (if RabbitMQ is unavailable, your DB is unavailable). The solution is to have the trigger use ordinary tables as queues, the trigger will enqueue a message in the local db queue (ie. will insert into this table) and and external process will service this queue by dequeueing the messages (ie. delete from the table) and forwarding them to RabbitMQ. This decouples the transaction from the RabbitMQ operation (the external process is able to see the message only if the original xact commits), but the cost is some obvious added latency (there is an extra hop involved, the local table acting as a queue).
idempotency. Since RabbitMQ cannot enroll in distributed transactions with the database you cannot guarantee atomicity of the DB operation (the dequeue from local table acting as queue) and the RabbitMQ operation (the send). Either one can succeed when the other failed, and there is simply no way around it w/o explicit distributed transaction enrollment support. Which implies that the application will send duplicate messages every once in a while (usually when things already go bad for some reason). And a quick heads up: enrolling into the act of explicit 'acknowledge' messages and send sequence numbers is a loosing battle as you'll quickly discover that you're reinventing TCP on top of messaging, that road is paved with bodies.
tolerance. For the same reasons as the item above every now in a while a message you believe was sent will never make it. Again, what damage this causes is entirely business specific. The issue is not how to prevent this situation (is almost impossible...) but how to detect this situation, and what to do about it. No silver bullet, I'm afraid.
You do mention in passing Service Broker (the fact that is powering Query Notification is the least interestign aspect of it...). As a messaging platform built into SQL Server which offers Exactly Once In Order delivery guarantees and is fully transacted it would solve all the above pain points (you can SEND from triggers withouth impunity, you can use Activation to solve the latency issue, you'll never see a duplicate or a missing message, there are clear error semantics) and some other pain points I did not mention before (consistency of backup/restore as the data and the messages are on the same unit of storage - the database, cosnsitnecy of HA/DR failover as SSB support both database mirroring and clustering etc). The draw back though is that SSB is only capable of talking to another SSB service, in other words it can only be used to exchange messages between two (or more) SQL Server instances. Any other use requires the parties to use a SQL Server to exchange messages. But if your endpoints are all SQL Server, then consider that there are some large scale deployments using Service Broker. Note that endpoints like php or asp.net can be considered SQL Server endpoints, they are just programming layers on top of the DB API, a different endpoint would, say, the need to send messages from handheld devices (phones) directly to the database (and eve those 99% of the time go through a web service, which means they can reach a SQL Server ultimately). Another consideration is that SSB is geared toward throughput and reliable delivery, not toward low latency. Is definitely not the technology to use to get back the response in a HTTP web request, for instance. IS the technology to use to submit for processing something triggered by a web request.

Remus's answer lays out some sound principals for generating and handling events. You can initiate the pushing of events from a trigger to achieve low latency.
You can achieve everything necessary from a trigger. We will still decouple this into two components: a trigger that generates the events and a local reader that reads the events.
The first component is the trigger.
Make a CLR trigger that prepares what needs to be done when the transaction commits.
Create a System.Transactions.IEnlistmentNotification that always agrees to be prepared, and whose void Commit(System.Transactions.Enlistment) method executes the prepared action.
In the trigger, call System.Transactions.Transaction.Current.EnlistVolatile(enlistmentNotification, System.Transactions.EnlistmentOptions.None)
You'll want your action to be short and sweet, like appending the data to a lockless queue in memory or updating some other state in memory. Don't try to communicate with other machines or processes. Don't write to a disk (if you wanted to write to a disk, just make an ordinary trigger that inserts into a queue table). You'll need to be careful to make sure your assembly is loaded only once so that any shared static state will be unique; this is easiest to do if your static state is in a top level assembly that isn't referenced by other assemblies, so no other assemblies will try to load it.
You will also need to either
initialize your state in such a way that it will be correct even if the system was restarted without sending all the previously queued messages (since a short, in memory queue will not be durable). This means you might be resending messages, so they will need to be idempotent. or
rely on the tolerance of another component to pick up on missed messages
The second component reads the state that is update by the trigger. Make a separate CLR component that reads from your queue or state, and does whatever you need done (like send an idempotent message to a messaging system, record that it was sent, whatever). If this component can fail (hint: it can), you will need some form of tolerance, which may belong in another system. You can achieve low latency by having the trigger signal the second component when new state is available.
One architectural possibility is to have the trigger put the event in memory on commit for another low-latency component to pick up and have the second component send a low-latency, low-reliability copy of an idempotent message. You can pair that with a more reliably or durable messaging system, such as SSB, that will reliably and durably, but with grater latency, send the same idempotent message later.

Async (Queued) Messaging and associated Session Data

I´m trying to implement session context data for a queued messaging system.
The session handling goes like this:
A front end application authenticates itself and receives a session id.
After that the session id is included in the message headers, so a message handler is provided with a context for e.g. security checks and audit logging. The client may pick up a session if it crashed and continue with it´s work.
So now we want to associate key/value pairs with the session id. But this creates many concurrency problems if the session data changes, as the session data used by the message handler should be that at the time the message was sent.
I see two possible solutions:
Put the associated session data in every message header
Store the session data versioned to the database and use a version id in the message header.
The first makes messages bigger, the second makes the session DB bigger and creates a lot of infrastructural code. I have to save the most current values to the DB in both, so a client may continue it´s work if it crashed or lost connection.
Are there any other solutions? I tend to use the first solution, but want to get some feedback first.
How do others deal with this (e.g. JMS/NServiceBus/Masstransit)?
Update based on Answer:
I´ve chosen to take the route of convincing my team members to use the session data only in the frontend and putting it into the messages if it is required for the message handler.

You didn't really go into detail about why you want to associate key/value pairs with the session concept.
Coming from NServiceBus and Udi Dahan's advice on SOA and service boundaries, this type of session concept tends to rub me the wrong way. My feeling is that message handlers should be, for the most part, fairly deterministic with respect to time. That is, it should run just as well right now, or sit in a queue for awhile and execute the exact same way at some point in the future.
So, my advice would be that for security purposes, go ahead and use message headers if necessary. In NServiceBus you can introduce message handlers from an IT/Ops Service that are configured to execute first in the handler chain, verifying security and stuff like that independent of the actual business logic. In this case, the header information just affects whether the message gets processed or rejected.
When you get to session type information, I would want to carefully analyze those requirements and put the relevant pieces in the message schema itself.
Again, it would be helpful to know the motivation behind the session data in the first place. If you edit your question, perhaps we could identify a way you could reorganize those requirements.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.