Under Service Oriented Architecture (SOA), I am interested in the question of whether a service should own its own data or not.
One of the constraints is that if anything fails at any point, we need to be able to roll the state of the entire system back to a prior state so we can retry or resume an operation.
If each service owns its own data, then does this imply that the system deals with change better from the programmers point of view?
However, if each service owns its own data, are there any mechanisms to roll the entire system back to a prior state so a failed operation can be resumed or retried?
It sounds like the granularity of what you call services might be wrong. A single service can have multiple endpoints (using same or different protocols) and if a message received on one endpoint requires rolling back state that was received on another it is still an internal transaction within the boundary of the service.
If we consider the simplistic example of order and customer services. The order services may have contracts with messages relating to the whole order or to an order line and cancelling the order will undo state that was affected by both. Usually the address change in the customer service would not be rolled back with that.
Sometimes service actions are tied together in a longer business process, to continue on the example above let's also add an invoicing service. so when we cancel an order we also want to cancel the invoice. However it is important to note that business rules within the realm of the invoicing service can behave differently, for instance and not "roll back" e.g. canceling an order late may require cancelation fees. This sort of long running interaction is what I call a saga (you can see a draft of that pattern here)
Also note that distributed transactions between services is usually not a good idea for several reasons (like holding locks for an external party you don't necessarily trust) you can read more about that here
The problem you raised here is (partially) solved by the two-phase commit protocol (see wikipedia article)
To avoid implementing this complex algorithm, you can dedicate one of the service of the architecture to data management. If you need data synchronization between different databases, try to do it on the lowest layer (ie system or DBMS).
SOA system defines more services within one system. This can provide more autonomous services in order to every service can be hosted on different machine.
But it does not mean that you can not provide unified persistent layer for all (domain) models which can point into one storage => simple business transaction when the whole system is spread into more computers or transaction for one system.
Autonomous domain model is useful besides other things during refactoring to avoid situation where a change in one model causes a change in another service => global changes in the whole application.
In short: No. Services don't "own" data.
Data are truths about the world, and implicitly durable and shared. Logical services (API) don't always map to real-world data in a 1-1 way. Physical services (code) are implementations that are very refactorable, which opposes the durable nature of data.
When you partition the data, you lose descriptive power and analytic insight. But where it really kills you is integrity. Data cannot be kept coherent across silos as you scale. For complex data, you need those foreign keys.
Put another way: a platform only has one "logical" DB (per environment), because there is only one universe. There are many valid reasons to break up a DB, such as HW limits, performance, coordination, replication, and compliance. But treat them as needed evils, used only when needed.
But I think you may be asking a different question: "should a long-running, data-based transaction be managed by a single authoritative service?" And typically, that answer is: Yes. That transaction service can implement the multiple steps to sequence the flow as it sees fit, such as 2-phase commit. All your other services should use that transaction service to execute the transaction.
BUT! That transaction service must interact with the DB as a shared resource using only atomic semantics. That includes all the transaction states (intent, then action, then result) so that recovery and rollbacks are possible. The database must be empowered to maintain integrity in the event of faults. I cannot stress this enough: everything, always must decompose into atomic DB operations if you want fault tolerance.
Related
We have created a dotnet core web api project which is using SQL Server database. Now, we are planning to deploy this project to Microsoft Azure.
While the deployment of this application, we are also considering to enable autoscaling option (horizontal scaling).
Before, we do it. We want to have some questions that we want to clarify.
Should we need to add some additional code in our application which allows autoscaling to work properly?
Properly in a sense, as there can be more than one instance of the application running because of horizontal scaling. We are using database and more than one instance is running will it case race condition (i.e., two resources accessing the same data at a time). I mean we can add a transaction (or use locking) in our code to avoid these kinds of scenarios?
I want to know that is there any best practices to follow while implementing that kind of application?
Thank you and waiting for your answers!
Consider the following points when designing an autoscaling strategy:
The system must be designed to be horizontally scalable. Avoid making
assumptions about instance affinity; do not design solutions that
require that the code is always running in a specific instance of a
process. When scaling a cloud service or web site horizontally, do
not assume that a series of requests from the same source will always
be routed to the same instance. For the same reason, design services
to be stateless to avoid requiring a series of requests from an
application to always be routed to the same instance of a service.
When designing a service that reads messages from a queue and
processes them, do not make any assumptions about which instance of
the service handles a specific message because autoscaling could
start additional instances of a service as the queue length grows.
The Competing Consumers pattern describes how to handle this
scenario.
If the solution implements a long-running task, design this task to
support both scaling out and scaling in. Without due care, such a
task could prevent an instance of a process from being shutdown
cleanly when the system scales in, or it could lose data if the
process is forcibly terminated. Ideally, refactor a long-running task
and break up the processing that it performs into smaller, discrete
chunks. The Pipes and Filters pattern provides an example of how you
can achieve this. Alternatively, you can implement a checkpoint
mechanism that records state information about the task at regular
intervals, and save this state in durable storage that can be
accessed by any instance of the process running the task. In this
way, if the process is shutdown, the work that it was performing can
be resumed from the last checkpoint by using another instance.
For more information, follow the doc : https://github.com/Huachao/azure-content/blob/master/articles/best-practices-auto-scaling.md
Regarding this:
Properly in a sense, as there can be more than one instance of the application running because of horizontal scaling. We are using database and more than one instance is running will it case race condition (i.e., two resources accessing the same data at a time). I mean we can add a transaction (or use locking) in our code to avoid these kinds of scenarios?
Please keep in mind that, even if the app is running on a single machine, requests will still be handled concurrently. This means that even on a single machine 2 requests can cause the same entry in the database to be updated. So the above questions about race conditions apply to single instance web apps as well.
Try to avoid locking: the whole point of (horizontal) scaling is to gain performance benefits. By using locks you effectively remove this benefits as only one process at a time can use the locked resource.
Other points of considerations are:
If you are using an in-memory cache you might want to swap it out for a distributed cache.
The guidance at the MS docs
I need to implement pipeline if Service Fabric's Reliable Services, and I need some guidelines about what of these approaches is preferable from the viewpoint of reliability simplicity and simple good design:
I have been investigating this topic a lot as well (to be applied to my work for NServiceBus and MessageHandler) and would like to provide my thoughts on the matter. However I haven't determined what the best model is yet.
If you disregard the practical implementation with ServiceFabric I would categorize the proposed approach in the following order when it comes to reliability:
C) The store and forward model is probably the best of the 3 models when it comes to interservice communication, all services can work independently from each other and are in no way subject to networking outages (at the downside of added latency)
A) Input queue per service: Each service free from impact by network outages for it's own work. However when it wishes to send messages to another service it may be impacted by network outages and needs retry built in to accomodate for this.
B) Output queue per service: Is probably the least of the 3 models as each service is directly dependent on a resource of the others, this results in to much dependency on network availability between the nodes.
If you look at it from a simplicity point of view, I would categorize them the following way
A) Input queue per service: As the message source needs to actively route messages to a given destination queue, it is fairly simpel to implement business processes or workflows (what I assume your pipeline is going to represent) using a routing pattern (either static routing or dynamic f.e. using a routing slip pattern
C) Store and forward: Again routing is an explicit part of your implementation, so both static and dynamic routing patterns are possible, however the practical implemenation is harder as you need to build and manage a messagepump that transfers messages from the transfer queue (output) to the destination queue and the associated need to flow context from the message source into the message pump. (Shameless plug: NServiceBus is a framework that can take away the complexity for you and make this scenario as simple as A)
B) Output queue per service: Each service needs to be setup to explicitly read from another's queue, this approach would only allow static routing as the routing rules are embedded in where you read from only (this severely limit you from a functional perspective)
If we take ServiceFabric's implementation details into account, then I assume you want to make use of the IReliableQueue implementation? This implementation has some shortcomings though, that make me wonder if these patterns can actually be implemented properly on ServiceFabric's native storage infrastructure.
The storage infrastructure is only available on Statefull services, so Stateless services (like Rest API's or other protocol termination gateway's) cannot be part of the pipeline (usually you want one of these as an entry point)
Only 1 thread can access a reliable queue at the same time, so it is impossible to write and read from the same queue at the same time. This severely limits throughput of the queue.
Accessing a reliable queue requires a local transaction, but these transactions are limited to a single partition. So it's also impossible to scale out your statefull services to create a competing consumer pattern.
Given these shortcomings, I'm still inclined to use another type of queueing infrastructure for SF Services instead of SF's persistence model, for example Azure Service Bus or Azure Storage Queues (Which NserviceBus allows as well).
In short, I'll support both A and C, with a slight preference for C, but I'm not convinced about using reliable queues as an implementation until these shortcomings have been resolved.
My ASP.NET MVC 4 project is using EF5 code-first, and some of the domain objects contain non- persisted counter properties which are updated according to incoming requests. These requests come very frequently and s scenario in which multiple request sessions are modifying these counters is quite probable.
My question is, is there a best practice, not necessarily related to ASP.NET or to EF, to handle this scenario? I think (but I'm not sure) that for the sake of this discussion, we can treat the domain objects as simple POCOs (which they are).
EDIT: As requested, following is the actual scenario:
The system is a subscriber and content management system. Peer servers are issuing requests which my system either authorizes or denies. Authorized requests result in opening sessions in peer servers. When a session is closed in the peer server, it issues a request notifying that the session has been closed.
My system needs to provide statistics - for example, the number of currently open sessions for each content item (one of the domain entities) - and provide real-time figures as well as per-minute, hourly, daily, weekly etc. figures.
These figures can't be extracted by means of querying the database due to performance issues, so I've decided to implement the basic counters in-memory, persist them every minute to the database and take the hourly, daily etc. figures from there.
The issue above results from the fact that each peer server request updates these "counters".
I hope it's clearer now.
Sounds like your scenario still requires a solid persistence strategy.
Your counter objects can be persisted to the HttpRuntime.Cache.
Dan Watson has an exceptional writeup here:
http://www.dotnetguy.co.uk/post/2010/03/29/c-httpruntime-simple-cache/
Be sure to use CacheItemPriority.NotRemovable to ensure that it maintains state during memory reclamation. The cache would be maintained within the scope of the app domain. You could retrieve and update counters (its thread safe!) in the cache and query its status from presumably a stats page or some other option. However if the data needs to be persisted beyond the scope of runtime then the strategy you're already using is sufficient.
Actually I think you have no need to wary about performance to much before you do not have enough info from tests and profiler tools.
But if you're working with EF, so you have deals with DataContext, which is the Unit Of Work pattern implementation described by Martin Fowler in his book. The main idea of such a pattern is reducing amount of requesting to database and operating the data in-memory as much as possible until you do not commit all your changes. So my short advice will be just using your EF entities in standard way, but not committing changes each time when data updates, but with the some intervals, for example after the 100 changes, storing data between requests in Session, Application session, Cache or somewhere else. The only thing you should care about is you using proper DataContext object each time, and do not forget disposed it when you no need it any more.
At our organization we have a SQL Server 2005 database and a fair number of database clients: web sites (php, zope, asp.net), rich clients (legacy fox pro). Now we need to pass certain events from the core database with other systems (MongoDb, LDAP and others). Messaging paradigm seems pretty capable of solving this kind of problem. So we decided to use RabbitMQ broker as a middleware.
The problem of consuming events from the database at first seemed to have only two possible solutions:
Poll the database for outgoing messages and pass them to a message broker.
Use triggers on certain tables to pass messages to a broker on the same machine.
I disliked the first idea due to latency issues which arise when periodical execution of sql is involved.
But event-based trigger approach has a problem which seems unsolvable to me at the moment. Consider this scenario:
A row is inserted into a table.
Trigger fires and sends a message (using a CLR Stored Procedure written in C#)
Everything is ok unless transaction which writes data is rolled back. In this case data will be consistent, but the message has already been sent and cannot be rolled back because trigger fires at the moment of writing to the database log, not at the time of transaction commit (which is a correct behaviour of a RDBMS).
I realize now that I'm asking too much of triggers and they are not suitable for tasks other than working with data.
So my questions are:
Has anyone managed to extract data events using triggers?
What other methods of consuming data events can you advise?
Is Query Notification (built on top of Service Broker) suitable in my situation?
Thanks in advance!
Lest first cut out of the of the equation the obvious misfit: Query Notification is not right technology for this, because is designed to address cache invalidation of relatively stable data. With QN you'll only know that table has changed, but you won't be able to know what had changed.
Kudos to you for figuring out why triggers invoking SQLCRL won't work: the consistency is broken on rollback.
So what does work? Consider this: BizTalk Server. In other words, there is an entire business built around this problem space, and solutions are far from trivial (otherwise nobody would buy such products).
You can get quite far though following a few principles:
decoupling. Event based triggers are OK, but do not send the message from the trigger. Aside from the consistency issue on rollback you also have the latency issue of having every DML operation now wait for an external API call (the RabbitMQ send) and the availability issue of the external API call failure (if RabbitMQ is unavailable, your DB is unavailable). The solution is to have the trigger use ordinary tables as queues, the trigger will enqueue a message in the local db queue (ie. will insert into this table) and and external process will service this queue by dequeueing the messages (ie. delete from the table) and forwarding them to RabbitMQ. This decouples the transaction from the RabbitMQ operation (the external process is able to see the message only if the original xact commits), but the cost is some obvious added latency (there is an extra hop involved, the local table acting as a queue).
idempotency. Since RabbitMQ cannot enroll in distributed transactions with the database you cannot guarantee atomicity of the DB operation (the dequeue from local table acting as queue) and the RabbitMQ operation (the send). Either one can succeed when the other failed, and there is simply no way around it w/o explicit distributed transaction enrollment support. Which implies that the application will send duplicate messages every once in a while (usually when things already go bad for some reason). And a quick heads up: enrolling into the act of explicit 'acknowledge' messages and send sequence numbers is a loosing battle as you'll quickly discover that you're reinventing TCP on top of messaging, that road is paved with bodies.
tolerance. For the same reasons as the item above every now in a while a message you believe was sent will never make it. Again, what damage this causes is entirely business specific. The issue is not how to prevent this situation (is almost impossible...) but how to detect this situation, and what to do about it. No silver bullet, I'm afraid.
You do mention in passing Service Broker (the fact that is powering Query Notification is the least interestign aspect of it...). As a messaging platform built into SQL Server which offers Exactly Once In Order delivery guarantees and is fully transacted it would solve all the above pain points (you can SEND from triggers withouth impunity, you can use Activation to solve the latency issue, you'll never see a duplicate or a missing message, there are clear error semantics) and some other pain points I did not mention before (consistency of backup/restore as the data and the messages are on the same unit of storage - the database, cosnsitnecy of HA/DR failover as SSB support both database mirroring and clustering etc). The draw back though is that SSB is only capable of talking to another SSB service, in other words it can only be used to exchange messages between two (or more) SQL Server instances. Any other use requires the parties to use a SQL Server to exchange messages. But if your endpoints are all SQL Server, then consider that there are some large scale deployments using Service Broker. Note that endpoints like php or asp.net can be considered SQL Server endpoints, they are just programming layers on top of the DB API, a different endpoint would, say, the need to send messages from handheld devices (phones) directly to the database (and eve those 99% of the time go through a web service, which means they can reach a SQL Server ultimately). Another consideration is that SSB is geared toward throughput and reliable delivery, not toward low latency. Is definitely not the technology to use to get back the response in a HTTP web request, for instance. IS the technology to use to submit for processing something triggered by a web request.
Remus's answer lays out some sound principals for generating and handling events. You can initiate the pushing of events from a trigger to achieve low latency.
You can achieve everything necessary from a trigger. We will still decouple this into two components: a trigger that generates the events and a local reader that reads the events.
The first component is the trigger.
Make a CLR trigger that prepares what needs to be done when the transaction commits.
Create a System.Transactions.IEnlistmentNotification that always agrees to be prepared, and whose void Commit(System.Transactions.Enlistment) method executes the prepared action.
In the trigger, call System.Transactions.Transaction.Current.EnlistVolatile(enlistmentNotification, System.Transactions.EnlistmentOptions.None)
You'll want your action to be short and sweet, like appending the data to a lockless queue in memory or updating some other state in memory. Don't try to communicate with other machines or processes. Don't write to a disk (if you wanted to write to a disk, just make an ordinary trigger that inserts into a queue table). You'll need to be careful to make sure your assembly is loaded only once so that any shared static state will be unique; this is easiest to do if your static state is in a top level assembly that isn't referenced by other assemblies, so no other assemblies will try to load it.
You will also need to either
initialize your state in such a way that it will be correct even if the system was restarted without sending all the previously queued messages (since a short, in memory queue will not be durable). This means you might be resending messages, so they will need to be idempotent. or
rely on the tolerance of another component to pick up on missed messages
The second component reads the state that is update by the trigger. Make a separate CLR component that reads from your queue or state, and does whatever you need done (like send an idempotent message to a messaging system, record that it was sent, whatever). If this component can fail (hint: it can), you will need some form of tolerance, which may belong in another system. You can achieve low latency by having the trigger signal the second component when new state is available.
One architectural possibility is to have the trigger put the event in memory on commit for another low-latency component to pick up and have the second component send a low-latency, low-reliability copy of an idempotent message. You can pair that with a more reliably or durable messaging system, such as SSB, that will reliably and durably, but with grater latency, send the same idempotent message later.
I have multi-layered application architecture that has 4 parts:
A networking server/client layer
An intermediate data layer to handle interactions between processes
A monitoring layer
A client layer made up of n number of instances
Client/Server layer:
The client/server layer handles asynchronous network communications with another computer implemented using a custom Layer 2 protocol. Due to design constraints built into the communications, it needs to remain independent and able to poll/push data to the data layer asynchronously.
Intermediate Layer:
The intermediate layer is currently implemented using a database. One table holds all of the possible labels that can be called on (about 120,000). A second table holds an intermediate cache of the first table containing only the values in use, this requires constant updates and gets flushed when a new collection of items is requested. The third table is where collection updates are sent and only contains data when a request is pending.
The Monitor Layer:
The monitor layer is a multi-threaded monolithic application. It spawns n number of client instances based on how many monitors are attached. It manages global state between all client instances because one or more of them may share similar/identical state. It creates a unique listing of values needed, manages sending update requests when the clients need a different set of labels, and manages recurring updates.
Obviously, this isn't ideal. If one instance goes down it can take the rest down with it. What I'd like to do is remove the intermediate layer, replace it with the monitor layer, and make everything spawn as subprocesses of the monitor process so they can be respawned at will if something goes awry (ex. comms heartbeat stops, client crashes, etc).
The database just seems too heavy and not specialized enough to handle the IPC (Inter Process Communications). The program was written under extreme time constraints so utilizing a database was the 'easy solution' with the expectation that it would change in the future. I'm a big fan of the robustness of Google Chrome's multi-process architecture but I know little about how they tie all the processes together (pipes, tcp, ?).
So:
Could I expect a significant performance improvement from using IPC over a database for the intermediate layer?
What form of IPC would be ideal on a Windows system?
Is there a cross platform (read Linux) alternative solution available that could be used in its place if development were moved to Mono?
Where can I find resources/examples to help get a start?
Note: I understand that the architecture of this system seems unnecessarily complex but it exists as a front-end for a much larger system. This application is also mission critical so stability trumps efficiency.
Update:
I forgot to mention in the initial question. The database data/index is loaded directly from a ramdisk on boot. The database itself has been indexed for optimal performance. Tables or values that require frequent writes are not indexed but the rest of the data is.
I'm looking for an alternative to measure against because optimization of the db has been taken to its limit and I think there's still a lot of room for improvement.
I will upload a some diagrams of the architecture as soon as I get some time to draw them up.
Yes. The database most likely involves the harddrive, and the harddrive is the slowest part of any computer so switching away from using the harddrive will probably have performance benefits.
I would go with zeromq / zmq. Its a message oriented framework that supports several communication patterns. For instance PUB/SUB or REQ/REP etc. More examples here
zmq is cross platform and its amazingly fast.
Some C# examples on github
I would consider looking into an Actor Model based solution, such as Akka.NET.