I'm developing an event booking application and am having difficulty figuring out how to manage the booking process. I know about db transactions and a little bit about locking but I have a lot of business rules to validate before a booking can be committed and I'm worried about performance bottlenecks.
Here's a summary of what's going on:
An Event has a maximum number of slots
A user can book one spot in the Event
Each user has an account with money in and each event costs a certain amount
Given the above parameters, the following business rules are what I need to validate for a booking to take place:
The user hasn't already booked a spot for this Event
The user has enough funds to book the Event
The Event has at least one spot available
The Event does not conflict with other events the user has booked (not so much of an issue as I can check this when displaying the page and hide this Event from the user)
My main worry is that if I pull all the information from the db up front (i.e. Event, User, Account, and existing Bookings) that by the time I run all the validation and come to commit the new booking, the state of the system will have possibly changed (i.e. someone else has booked the last spot, money has left my account, etc).
If I was to lock the code/database tables around this booking process, then I've potentially (??) got the lock for quite a while affecting other operations in the system and causing performance issues at peak times.
Can anyone suggest an approach whereby I can manage or at least limit these concerns.
I'm building an asp.net app in c# and using sql server 2005.
I think a good example to look at is how Ticketmaster reserves seats for tickets that MAY get purchased. They tell you that you have so many minutes until the seats are put back into inventory. It pushes the purchaser to make a decision or someone else will have a chance at the seats. This is really your biggest hurdle. As for checking the business rules, you'll have to do that. There is no magic around what needs to be done there. If you need the data to validate a rule then that's what you need to do. With some proper mapping and outlining you can find the efficiencies. I hope that answered your question.
Good luck!
One solution:
Pre-emptively book the spot (with a status of "hold").
Validate.
If the booking can't be kept due to business rules, delete it. If not, change status to "booked"
If you go back to the 80s and read literature published on the topic of transaction processing you'll find that one of the most discussed example was the airline reservation systems. And for good reason, as it was one of the OLTP topics that exposed all the issues around transaction processing: correctness, troughput, contention, deadlocks. What you describe is a very similar problem, but instead of air flight seats you have event slots. So yes, you will have all those issues.
There is no magic pixie dust. This is a hard problem. But there are some guiding lines:
Forgetful Fred cannot lock a slot for ever. Forgetful Fred is the user that opens the reservation screen, picks a seat, then goes to lunch without finishing the transaction. If this is allowed, then the system will slowly 'leak' slots that aren't used
Database locks are too expensive to be held while waiting for user input.
Throughput can only be achieved with granular locks.
The business logic should not attempt concurent updates on correlated items.
Everything displayed to the user should be treated as 'tentative'.
The user interface should be prepared to handle update conflicts.
The update logic should always follow the same hierachy (eg. if the agreed update logic is Account->User->Event->Booking then a rogue transaction trying to update Booking->Event->User will cause deadlocks).
And as a matter of fact there is an approach that limits these concerns: workflow processing backed by transactional queues that leverage correlated items exclusive lock out. Not your everyday ASP task for sure, so I'd recommend you stick with what you know.
Related
Question is pretty forward and it is bothering me for quite some time now. Perhaps any of you has an interesting view on this matter.
When creating an object lets say a memo.
This memo has a string (for the text obviously)
And a user id who has edited said text.
And this memo object is going to be saved to the database.
Is it common to add the current time to the object (eg. add a DateTime field / property) and save that object to db?
Or use the the current time of the database eg. when the row is inserted DateTime.Now in DB?
Considering that your application may run on multiple machines and users which may have different time zones or wrong time on their devices, the first option you proposed could affect data's integrity and you will end up with an unsuncronised database.
I feel it would not make much difference though it really depends on your requirement and project architecture. If you are interested in showing the time to user and you use some queue or background worker to insert row to DB (that is there could be some difference in time between creation of object and insertion into SQL), then it perhaps make sense to get the time at the time of creation of object. Else, if it only for record keeping then DB timestamp should be fine.
Although your example is simple one it may represent typical scenario in enterprise environment. So, instead of one user and memo there may be lots of users, lots of tasks to be executed by those users and there are supervisors who are monitoring how users are performing. With that in mind you should try to log everything that you can since supervisors will ask for different productivity reports. And basic elements of such reports are "Start time" and "End time" of activity.
That said, it is of lesser importance which time you are using as long as activities of all users may be compared. Do have in mind that some activities may be executed in different time zones. This leads us to the fact that time used for events (such as start time and end time) should come from common source. Either some middle layer or database.
Implicitly, you need to log user activities. So you will have a method that will have several parameters that are sufficient for capturing user activities. Execution of such method should be on middle or database layer and therefore time registered should be consistent and comparable.
With this approach you have possibility to extend your definition of captured events (not only start and end time but also some other relevant moments).
im new to this so please forgive any mistakes. im building an ecommerce platform to sell certain goods. i want to ensure that if multiple users buy the same product at around the same time. the user should not be allowed to go to payment if enough stock is not available. the problem is im not decreasing the stock from the database until payment is successfully made.while one user makes a payment to the others the stock might still appear full and they could in turn place orders for stock that doesnt exist. I just need some direction to implement some kind of locking mechanism for product stock under processing. like how while booking movie tickets the seat is blocked until the transaction completes else it is released.I want to implement the same for stock. any ideas?
Im using asp.net C#
the solution i found was to reduce the stock quantity before going to the payment gateway and keep a timer which puts the stock back on after timer elapses only if payment is not successful.
Well, the reason why you can't come up with solution is because you don't have implementation, but rather business problem.
Once business policies are defined, implementation will be easy.
For example, Amazon allows ordering or products that are running out by saying "we'll deliver in 2-4 weeks". Meaning, you can still order, you'll just get product later / emailed once it's on the way.
In your case, however, if you have finite resources (seats / movie tickets) - you can start putting people in waiting list. Like, warn them that you are running out of seats, but allow them to complete process normally. Then as you are processing payments, you can send more detailed emails to those who are left without seat. Kinda like airlines do. Once you sort out those who really bought tickets refund those who are left without seats.
If your problem is deeper (like you are thinking about blocking individual seats) - definitely think more about business policies. Like, do you really need to go down that level? Can't you do what airlines are doing (sell tickets for certain class first, and then later allow users to pick seats)? Again - core of your problem is business, rather than implementation. Once use cases are properly defined, you'll probably just need to use standard database locks.
The most important thing in processes like this is transparency. Be honest with your users. Keep them informed through the process and don't try to trick them - like show more seats that there really is, just to incentify them to complete the purchase.
In my case, I used message queue (Hangfire) to handle orders and configured it such that it only processes one job at a time in sequence on the same product. FIFO style.
Notice that you're adding a new overhead to the whole ordering processing which delays things. Need to re-architect resources to minimize that time.
I have been asked to add a feature to some software I have written which needs to send reminder emails.
The reminders can have a recurrence; for example, it might recur every 3 days or every 2 weeks. The reminders also have a start and end date. I have created an SQL table with these fields:
event_id
event_name
event_Description
event_startDate
event_endDate
RecurrenceType (e.g. Daily, Weekly, Monthly,Yearly)
Intarval
event_ActiveFlag
I now need to write a stored procedure that will run every time and send the reminders. I have no problem with sending via a C# console application; the trouble I am having is that I cannot figure out how to get the recurrences for the current day.
No matter what, you cannot get around needing to perform a computation to determine the list of recurrence dates for a given event. By "computation" I mean something like the following pseudocode:
foreach(event in activeEvents)
{
while(recurrenceDate <= currentDate && recurrenceDate <= event.EndDate)
{
event.RecurrenceDates.Add(recurrenceDate);
recurrenceDate.AddDays(event.IntervalLengthInDays);
}
}
I highly recommend using Douglas Day's excellent .NET iCal library (be sure to read the license terms to ensure you are permitted to use this in your app, though they are quite permissive) to calculate recurrence dates for you, because despite appearing straightforward, it's actually very difficult to correctly handle all the corner cases.
This will give you a list of dates that the event occurs on, up to the current date. There are two questions: when and where should you perform this calculation?
As for where, I recommend in the C# app, since you've already stated you have such a component involved. It is absolutely possible to do this in a performant, set-based way in your SQL database; however, the obvious, intuitive way to code this is using loops. C# is great at loops; SQL is not. And doing it the obvious way will make it much easier to debug and maintain when you (or someone else) looks at it six months down the line.
Regarding when, there are two possibilities:
Option 1: On demand: do it as part of the job that sends your daily reminder blast.
Fetch the data for all active events
Compute recurrence lists for them
Examine the last recurrent event (this will be the one closest to the current date) to determine if that event should be sent out as a reminder
Send eligible reminders
Option 2: Once: do it when the event is created and entered in the system.
When an event is created, calculate all recurrence dates through its end date
Store these dates in a table in the DB (event_id FK, event_date)
Your reminder job fetches a list of eligible events (both active and has the appropriate date) from the precomputed table
Send eligible reminders
Which option is better? My money is on #1. Why?
With option 1, if I realize I accidentally entered "daily" instead of "weekly" for my recurrence period, changing the event is much easier, because I don't have recompute and re-store all the recurrence data. This is the reason to calculate recurrence on demand, and should trump all but the most dire of performance related concerns (most of which could likely be fixed by throwing more hardware at the problem, or by better load balancing).
Option 2 is degenerate in the case of events without a defined end date. This is manageable by, say, calculating X years into the future, but what happens X years later?
A daily reminder job probably runs overnight, and so doesn't need to execute super fast. This means cutting down on its execution time isn't a priority. And the recurrence calculation time is probably going to be negligible anyway; I expect the bottleneck to be the actual email sending. However, if you have a lot of active, old events to consider (by "a lot", I mean billions) and/or your app server is grinding to a halt under load, this may become important.
Option 1 saves DB work. I don't mean storage space (although it does use less); that shouldn't matter unless you have a lot (many trillions) of events. I mean that it's less "chatter" back and forth between your app server and the DB, so there's less chance for a dropped connection, concurrency collision, etc. Please note this is incredibly trivial and really doesn't matter either way, unless your production environment has major problems.
I have a running order for 2 handlers Deleting and Reordering pictures and would like some advises for the best solution.
On the UI some pictures are deleted, the user clicks on the deleted button. The whole flow, delete command up to an event handler which actually deletes the physical files is started.
Then immediately the user sorts the remaining pictures. A new flow from reorder command up to the reordering event handler for the file system fires again.
Already there is a concurrency problem. The reordering cannot be correctly applied without having the deletion done. At the moment this problem is handled with some sort of lock. A temp file is created and then deleted at the end of the deletion flow. While that file exists the other thread (reordering or deletion depending on the user actions) awaits.
This is not an ideal solution and would like to change it.
The potential solution must be also pretty fast (off course the current one is not a fast one) as the UI is updated thru a JSON call at the end of ordering.
In a later implementation we are thinking to use a queue of events but for the moment we are pretty stuck.
Any idea would be appreciated!
Thank you, mosu'!
Edit:
Other eventual consistency problems that we had were solved by using a Javascript data manager on the client side. Basically being optimist and tricking the user! :)
I'm starting to believe this is the way to go here as well. But then how would I know when is the data changed in the file system?
Max suggestions are very welcomed and normally they apply.
It is hard sometimes to explain all the details of an implementation but there is a detail that should be mentioned:
The way we store the pictures means that when reordered all pictures paths (and thus all links) change.
A colleague hat the very good idea of simply remove this part. That means that even if the order will change the path of the picture will remain the same. On the UI side there will be a mapping between the picture index in the display order and its path and this means there is no need to change the file system anymore, except when deleting.
As we want to be as permissive as possible with our users this is the best solution for us.
I think, in general, it is also a good approach when there appears to be a concurrency issue. Can the concurrency be removed?
Here is one thought on this.
What exactly you are reordering? Pictures? Based on, say, date.
Why there is command for this? The result of this command going to be seen by everyone or just this particular user?
I can only guess, but it looks like you've got a presentation question here. There is no need to store pictures in some order on the write side, it's just a list of names and links to the file storage. What you should do is to store just a little field somewhere in the user settings or collection settings: Date ascending or Name descending. So you command Reorder should change only this little field. Then when you are loading the gallery this field should be read first and based on this you should load one or another view. Since the store is cheap nowadays, you can store differently sorted collections on the read side for every sort params you need.
To sum up, Delete command is changing the collection on the write side, but Reoder command is just user or collection setting. Hence, there is no concurrency here.
Update
Based on your comments and clarifications.
Of course, you can and, probably, should restrict user actions only by one at the time. If time of deletion and reordering is reasonably short. It's always a question of type of user experience you are asked to achieve. Take a usual example of ordering system. After an order placed, user can almost immediately see it in the UI and the status will be something like InProcess. Most likely you won't let user to change the order in any way, which means you are not going to show any user controls like Cancel button(of course this is just an example). Hence, you can use this approach here.
If 2 users can modify the same physical collection, you have no choice here - you are working with shared data and there should be kind of synchronization. For instance, if you are using sagas, there can be a couple of sagas: Collection reordering saga and Deletion saga - they can cooperate. Deletion process started first - collection aggregate was marked as deletion in progress and then right after this reordering saga started, it will attempt to start the reordering process, but since deletion saga is inprocess, it should wait for DeletedEvent and continue the process afterwards.The same if Reordering operation started first - the Deletion saga should wait until some event and continue after that event arrived.
Update
Ok, if we agreed not touch the file system itself, but the aggregate which represents the picture collection. The most important concurrency issues can be solved with optimistic concurrency approach - in the data storage a unique constraint, based on aggregate id and aggregate version, is usually used.
Here are the typical steps in the command handler:
This is the common sequence of steps a command handler follows:
Validate the command on its own merits.
Load the aggregate.
Validate the command on the current state of the aggregate.
Create a new event, apply the event to the aggregate in memory.
Attempt to persist the aggregate. If there's a concurrency conflict during this step, either give up, or retry things from step 2.
Here is the link which helped me a lot some time ago: http://www.cqrs.nu/
I have a ASP.NET C# business webapp that is used internally. One issue we are running into as we've grown is that the original design did not account for concurrency checking - so now multiple users are accessing the same data and overwriting other users changes. So my question is - for webapps do people usually use a pessimistic or optimistic concurrency system? What drives the preference to use one over another and what are some of the design considerations to take into account?
I'm currently leaning towards an optimistic concurrency check since it seems more forgiving, but I'm concerned about the potential for multiple changes being made that would be in contradiction to each other.
Thanks!
Optimistic locking.
Pessimistic is harder to implement and will give problems in a web environment. What action will release the lock, closing the browser? Leaving the session to time out? What about if they then do save their changes?
You don't specify which database you are using. MS SQL server has a timestamp datatype. It has nothing to do with time though. It is mearly a number that will get changed each time the row gets updated. You don't have to do anything to make sure it gets changed, you just need to check it. You can achive similar by using a date/time last modified as #KM suggests. But this means you have to remember to change it each time you update the row. If you use datetime you need to use a data type with sufficient precision to ensure that you can't end up with the value not changing when it should. For example, some one saves a row, then someone reads it, then another save happens but leaves the modified date/time unchanged. I would use timestamp unless there was a requirement to track last modified date on records.
To check it you can do as #KM suggests and include it in the update statement where clause. Or you can begin a transaction, check the timestamp, if all is well do the update, then commit the transaction, if not then return a failure code or error.
Holding transactions open (as suggested by #le dorfier) is similar to pessimistic locking, but the amount of data locked may be more than a row. Most RDBM's lock at the page level by default. You will also run into the same issues as with pessimistic locking.
You mention in your question that you are worried about conflicting updates. That is what the locking will prevent surely. Both optimistic or pessimistic will, when properly implemented prevent exactly that.
I agree with the first answer above, we try to use optimistic locking when the chance of collisions is fairly low. This can be easily implemented with a LastModifiedDate column or incrementing a Version column. If you are unsure about frequency of collisions, log occurrences somewhere so you can keep an eye on them. If your records are always in "edit" mode, having separate "view" and "edit" modes could help reduce collisions (assuming you reload data when entering edit mode).
If collisions are still high, pessimistic locking is more difficult to implement in web apps, but definitely possible. We have had good success with "leasing" records (locking with a timeout)... similar to that 2 minute warning you get when you buy tickets on TicketMaster. When a user goes into edit mode, we put a record into the "lock" table with a timeout of N minutes. Other users will see a message if they try to edit a record with an active lock. You could also implement a keep-alive for long forms by renewing the lease on any postback of the page, or even with an ajax timer. There is also no reason why you couldn't back this up with a standard optimistic lock mentioned above.
Many apps will need a combination of both approaches.
here's a simple solution to many people working on the same records.
when you load the data, get the last changed date, we use LastChgDate on our tables
when you save (update) the data add "AND LastChgDate=previouslyLoadedLastChgDate" to the where clause. If the row count=0 on the update, issue error where "someone else has already saved this data" and rollback everything, otherwise the data is saved.
I generally do the above logic on header tables only and not on the details tables, since they are all in one transaction.
I assume you're experiencing the 'lost update' problem.
To counter this as a rule of thumb I use pessimistic locking when the chances of a collision are high (or transactions are short lived) and optimistic locking when the chances of a collision are low (or transactions are long lived, or your business rules encompass multiple transactions).
You really need to see what applies to your situation and make a judgment call.