Imagine the following situation. Windows service from time to time checks the data in database table. When some new data appears, it starts processing each new row.
The processing consists of several logical stages, let it be:
get some additional data from web service;
find an existins object via web service by data from stage 1 or create a new object;
inform an interested person of the actions done (with details about the object that was found/created on stage 2);
do smth else.
Just for now if any exception happens, the service updates the DB row and sets a flag indicating that a mistake has happened. Some time later service will try to process the row once again... and here is the problem.
Processing will start from the very beginning, from stage 1. In this case if exception happened on stage 4 and if it will happen again and again the interested person from stage 3 will be informed again and again and again...
To completely stop row processing in case of exception is not possible and not desirable in my situation. Ideally it would be nice if there was a way to start processing from the stage where it failed last time.
Now I need your advice how all these can be handled. In fact, everything is even more complicated, because there are several patterns of processing, different number of stages and so on.
Thanks in advance.
UPDATE
Yes, I have the State parameter in data rows :) It is just not used just for now. And exception handling is not a new thing for me.
The question is: what is the best way to handle state switching? In other words to make a clear logical link between stage number and processing method? The flow of execution can be very diffirent and include various number of stages and methods for different rows.
I hope, there are more pleasant ways than writing endless switch/case blocks for every new sutiation?
There are a couple of patterns that can help with each issue in your description.
Windows service checking queue. Have a timer in your service that runs every 1 minute or 5 minutes or whatever, check the queue, and if any new entries, start processing. (see example here: Best Timer for using in a Windows service)
Following a series of steps is generally called a workflow. In a workflow, you have a current status, and you update that status in each stage. So every row would begin in stage = 1. After first step, stage = 2, etc, etc. On exception, it will be on the stage it left off, which will then start the process over again on that stage, or whatever your logic is. This status would be stored with each row, and the dispatch code would check the status and send the service to the correct starting code for the current stage. I.e., think of a set of If statements based on the status.
Handling exceptions is very simple. Every unit of work should be wrapped in a try...catch block. On error, log the exception, and mark the row according to your business rules.
As far as the implementation, use programming best practices to keep your code clean, modular, neat and organized. As you develop the solution, bring specific questions back for more help.
Add a field to your database table which tracks the state of each row. You could call this new field ProcessingState for example.
As the row goes through each logical state you can update this ProcessingState field to identify which state the row is in.
Each logical step in your service should just work rows that are in the appropriate state.
Here is an example, lets say you have five logical steps to work through. You could have the following states;
Waiting State 1.
State 1 complete
Waiting State 2
State 2 complete
etc..
Good luck.
Related
I've run into this a few times recently at work. Where we have to develop an application that completes a series of items on a schedule, sometimes this schedule is configurable by the end user, other times its set in Config File. Either way, this task is something that should only be executed once, by a single machine. This isnt generally difficult, until you introduce the need for SOA/Geo Redundancy. In this particular case there are a total of 4 (could be 400) instances of this application running. There are two in each data center on opposite sides of the US.
I'm investigating successful patterns for this sort of thing. My current solution has each physical location determining if it should be active or dormant. We do this by checking a Session object that is maintained to another server. If DataCenter A is the live setup, then the logic auto-magically prevents the instances in DataCenter B from performing any execution. (We dont want the work to traverse the MPLS between DCs)
The two remaining instances in DC A will then query the Database for any jobs that need to be executed in the next 3 hours and cache them. A separate timer runs every second checking for jobs that need executed.
If it finds one it will execute a stored procedure first, that forces a full table lock, queries for the job that needs to be executed, checks the "StartedByInstance" Column for a value, if it doesnt find a value then it marks that record as being executed by InstanceX. Only then will it actually execute the job.
My direct questions are:
Is this a good pattern?
Are there any better patterns?
Are there any libraries/apis that would be of interest?
Thanks!
I have a running order for 2 handlers Deleting and Reordering pictures and would like some advises for the best solution.
On the UI some pictures are deleted, the user clicks on the deleted button. The whole flow, delete command up to an event handler which actually deletes the physical files is started.
Then immediately the user sorts the remaining pictures. A new flow from reorder command up to the reordering event handler for the file system fires again.
Already there is a concurrency problem. The reordering cannot be correctly applied without having the deletion done. At the moment this problem is handled with some sort of lock. A temp file is created and then deleted at the end of the deletion flow. While that file exists the other thread (reordering or deletion depending on the user actions) awaits.
This is not an ideal solution and would like to change it.
The potential solution must be also pretty fast (off course the current one is not a fast one) as the UI is updated thru a JSON call at the end of ordering.
In a later implementation we are thinking to use a queue of events but for the moment we are pretty stuck.
Any idea would be appreciated!
Thank you, mosu'!
Edit:
Other eventual consistency problems that we had were solved by using a Javascript data manager on the client side. Basically being optimist and tricking the user! :)
I'm starting to believe this is the way to go here as well. But then how would I know when is the data changed in the file system?
Max suggestions are very welcomed and normally they apply.
It is hard sometimes to explain all the details of an implementation but there is a detail that should be mentioned:
The way we store the pictures means that when reordered all pictures paths (and thus all links) change.
A colleague hat the very good idea of simply remove this part. That means that even if the order will change the path of the picture will remain the same. On the UI side there will be a mapping between the picture index in the display order and its path and this means there is no need to change the file system anymore, except when deleting.
As we want to be as permissive as possible with our users this is the best solution for us.
I think, in general, it is also a good approach when there appears to be a concurrency issue. Can the concurrency be removed?
Here is one thought on this.
What exactly you are reordering? Pictures? Based on, say, date.
Why there is command for this? The result of this command going to be seen by everyone or just this particular user?
I can only guess, but it looks like you've got a presentation question here. There is no need to store pictures in some order on the write side, it's just a list of names and links to the file storage. What you should do is to store just a little field somewhere in the user settings or collection settings: Date ascending or Name descending. So you command Reorder should change only this little field. Then when you are loading the gallery this field should be read first and based on this you should load one or another view. Since the store is cheap nowadays, you can store differently sorted collections on the read side for every sort params you need.
To sum up, Delete command is changing the collection on the write side, but Reoder command is just user or collection setting. Hence, there is no concurrency here.
Update
Based on your comments and clarifications.
Of course, you can and, probably, should restrict user actions only by one at the time. If time of deletion and reordering is reasonably short. It's always a question of type of user experience you are asked to achieve. Take a usual example of ordering system. After an order placed, user can almost immediately see it in the UI and the status will be something like InProcess. Most likely you won't let user to change the order in any way, which means you are not going to show any user controls like Cancel button(of course this is just an example). Hence, you can use this approach here.
If 2 users can modify the same physical collection, you have no choice here - you are working with shared data and there should be kind of synchronization. For instance, if you are using sagas, there can be a couple of sagas: Collection reordering saga and Deletion saga - they can cooperate. Deletion process started first - collection aggregate was marked as deletion in progress and then right after this reordering saga started, it will attempt to start the reordering process, but since deletion saga is inprocess, it should wait for DeletedEvent and continue the process afterwards.The same if Reordering operation started first - the Deletion saga should wait until some event and continue after that event arrived.
Update
Ok, if we agreed not touch the file system itself, but the aggregate which represents the picture collection. The most important concurrency issues can be solved with optimistic concurrency approach - in the data storage a unique constraint, based on aggregate id and aggregate version, is usually used.
Here are the typical steps in the command handler:
This is the common sequence of steps a command handler follows:
Validate the command on its own merits.
Load the aggregate.
Validate the command on the current state of the aggregate.
Create a new event, apply the event to the aggregate in memory.
Attempt to persist the aggregate. If there's a concurrency conflict during this step, either give up, or retry things from step 2.
Here is the link which helped me a lot some time ago: http://www.cqrs.nu/
In my plugin I have a code to check the execution context Depth to avoid infinite loop once the plugin updates itself/entity, but there are cases that entity is being updated from other plugin or workflow with depth 2,3 or 4 and for that specific calls, from that specific plugin I want to process the call and not stop even if the Depth is bigger then 1.
Perhaps a different approach might be better? I've never needed to consider Depth in my plug-ins. I've heard of other people doing the same as you (checking the depth to avoid code from running twice or more) but I usually avoid this by making any changes to the underlying entity in the Pre Operation stage.
If, for example, I have code that changes the name of an Opportunity whenever the opportunity is updated, by putting my code in the post-operation stage of the Update my code would react to the user changing a value by sending a separate Update request back to the platform to apply the change. This new Update itself causes my plug-in to fire again - infinite loop.
If I put my logic in the Pre-Operation stage, I do it differently: the user's change fires the plugin. Before the user's change is committed to the platform, my code is invoked. This time I can look at the Target that was sent in the InputParameters to the Update message. If the name attribute does not exist in the Target (i.e. it wasn't updated) then I can append an attribute called name with the desired value to the Target and this value will get carried through to the platform. In other words, I am injecting my value into the record before it is committed, thereby avoiding the need to issue another Update request. Consequently, my changes causes no further plug-ins to fire.
Obviously I presume that your scenario is more complex but I'd be very surprised if it couldn't fit the same pattern.
I'll start by agreeing with everything that Greg said above - if possible refactor the design to avoid this situation.
If that is not possible you will need to use the IPluginExecutionContext.SharedVariables to communicate between the plug-ins.
Check for a SharedVariable at the start of your plug-in and then set/update it as appropriate. The specific design you'll use will vary based on the complexity you need to manage. I always get use a string with the message and entity ID - easy enough to serialize and deserialize. Then I always know whether I'm already executing the against a certain message for a specific record or not.
I provide a Web Service for my clients which allow him to add a record to the production database.
I had an incident lately, in which my client's programmer called the service in a loop , iterated to call to my service thousands of times.
My question is what would be the best way to prevent such a thing.
I thought of some ways:
1.At the entrence to the service, I can update counters for each client that call the service, but that looks too clumbsy.
2.Check the IP of the client who called this service, and raise a flag each time he/she calls the service, and then reset the flag every hour.
I'm positive that there are better ways and would appriciate any suggestions.
Thanks, David
First you need to have a look at the legal aspects of your situation: Does the contract with your client allow you to restrict the client's access?
This question is out of the scope of SO, but you must find a way to answer it. Because if you are legally bound to process all requests, then there is no way around it. Also, the legal analysis of your situation may already include some limitations, in which way you may restrict the access. That in turn will have an impact on your solution.
All those issues aside, and just focussing on the technical aspects, do you use some sort of user authentication? (If not, why not?) If you do, you can implement whatever scheme you decide to use on a per user base, which I think would be the cleanest solution (you don't need to rely on IP addresses, which is a somehow ugly workaround).
Once you have your way of identifying a single user, you can implement several restrictions. The fist ones that come to my mind are these:
Synchronous processing
Only start processing a request after all previous requests have been processed. This may even be implemented with nothing more but a lock statement in your main processing method. If you go for this kind of approach,
Time delay between processing requests
Requires that after one processing call a specific time must pass before the next call is allowed. The easiest solution is to store a LastProcessed timestamp in the user's session. If you go for this approach, you need to start thinking of how to respond when a new request comes in before it is allowed to be processed - do you send an error message to the caller? I think you should...
EDIT
The lock statement, briefly explained:
It is intended to be used for thread safe operations. the syntax is as follows:
lock(lockObject)
{
// do stuff
}
The lockObject needs to be an object, usually a private member of the current class. The effect is that if you have 2 threads who both want to execute this code, the first to arrive at the lock statement locks the lockObject. While it does it's stuff, the second thread can not acquire a lock, since the object is already locked. So it just sits there and waits until the first thread releases the lock when it exits the block at the }. Only thhen can the second thread lock the lockObject and do it's stuff, blocking the lockObject for any third thread coming along, until it has exited the block as well.
Careful, the whole issue of thread safety is far from trivial. (One could say that the only thing trivial about it are the many trivial errors a programmer can make ;-)
See here for an introduction into threading in C#
The way is to store on the session a counter and use the counter to prevent too many calls per time.
But if your user may try to avoid that and send different cookie each time*, then you need to make a custom table that act like the session but connect the user with the ip, and not with the cookie.
One more here is that if you block basic on the ip you may block an entire company that come out of a proxy. So the final correct way but more complicate is to have both ip and cookie connected with the user and know if the browser allow cookie or not. If not then you block with the ip. The difficult part here is to know about the cookie. Well on every call you can force him to send a valid cookie that is connected with an existing session. If not then the browser did not have cookies.
[ * ] The cookies are connected with the session.
[ * ] By making new table to keep the counters and disconnected from session you can also avoid the session lock.
In the past I have use a code that used for DosAttack, but none of them are working good when you have many pools and difficult application so I now use a custom table as I describe it. This are the two code that I have test and use
Dos attacks in your web app
Block Dos attacks easily on asp.net
How to find the clicks per seconds saved on a table. Here is the part of my SQL that calculate the Clicks Per Second. One of the tricks is that I continue to add clicks and make the calculation of the average if I have 6 or more seconds from the last one check. This is a code snipped from the calculation as an idea
set #cDos_TotalCalls = #cDos_TotalCalls + #NewCallsCounter
SET #cMilSecDif = ABS(DATEDIFF(millisecond, #FirstDate, #UtpNow))
-- I left 6sec diferent to make the calculation
IF #cMilSecDif > 6000
SET #cClickPerSeconds = (#cDos_TotalCalls * 1000 / #cMilSecDif)
else
SET #cClickPerSeconds = 0
IF #cMilSecDif > 30000
UPDATE ATMP_LiveUserInfo SET cDos_TotalCalls = #NewCallsCounter, cDos_TotalCallsChecksOn = #UtpNow WHERE cLiveUsersID=#cLiveUsersID
ELSE IF #cMilSecDif > 16000
UPDATE ATMP_LiveUserInfo SET cDos_TotalCalls = (cDos_TotalCalls / 2),
cDos_TotalCallsChecksOn = DATEADD(millisecond, #cMilSecDif / 2, cDos_TotalCallsChecksOn)
WHERE cLiveUsersID=#cLiveUsersID
Get user ip and insert it into cache for an hour after using web service, this is cached on server:
HttpContext.Current.Cache.Insert("UserIp", true, null,DateTime.Now.AddHours(1),System.Web.Caching.Cache.NoSlidingExpiration);
When you need to check if user entered in last hour:
if(HttpContext.Current.Cache["UserIp"] != null)
{
//means user entered in last hour
}
I'm developing an event booking application and am having difficulty figuring out how to manage the booking process. I know about db transactions and a little bit about locking but I have a lot of business rules to validate before a booking can be committed and I'm worried about performance bottlenecks.
Here's a summary of what's going on:
An Event has a maximum number of slots
A user can book one spot in the Event
Each user has an account with money in and each event costs a certain amount
Given the above parameters, the following business rules are what I need to validate for a booking to take place:
The user hasn't already booked a spot for this Event
The user has enough funds to book the Event
The Event has at least one spot available
The Event does not conflict with other events the user has booked (not so much of an issue as I can check this when displaying the page and hide this Event from the user)
My main worry is that if I pull all the information from the db up front (i.e. Event, User, Account, and existing Bookings) that by the time I run all the validation and come to commit the new booking, the state of the system will have possibly changed (i.e. someone else has booked the last spot, money has left my account, etc).
If I was to lock the code/database tables around this booking process, then I've potentially (??) got the lock for quite a while affecting other operations in the system and causing performance issues at peak times.
Can anyone suggest an approach whereby I can manage or at least limit these concerns.
I'm building an asp.net app in c# and using sql server 2005.
I think a good example to look at is how Ticketmaster reserves seats for tickets that MAY get purchased. They tell you that you have so many minutes until the seats are put back into inventory. It pushes the purchaser to make a decision or someone else will have a chance at the seats. This is really your biggest hurdle. As for checking the business rules, you'll have to do that. There is no magic around what needs to be done there. If you need the data to validate a rule then that's what you need to do. With some proper mapping and outlining you can find the efficiencies. I hope that answered your question.
Good luck!
One solution:
Pre-emptively book the spot (with a status of "hold").
Validate.
If the booking can't be kept due to business rules, delete it. If not, change status to "booked"
If you go back to the 80s and read literature published on the topic of transaction processing you'll find that one of the most discussed example was the airline reservation systems. And for good reason, as it was one of the OLTP topics that exposed all the issues around transaction processing: correctness, troughput, contention, deadlocks. What you describe is a very similar problem, but instead of air flight seats you have event slots. So yes, you will have all those issues.
There is no magic pixie dust. This is a hard problem. But there are some guiding lines:
Forgetful Fred cannot lock a slot for ever. Forgetful Fred is the user that opens the reservation screen, picks a seat, then goes to lunch without finishing the transaction. If this is allowed, then the system will slowly 'leak' slots that aren't used
Database locks are too expensive to be held while waiting for user input.
Throughput can only be achieved with granular locks.
The business logic should not attempt concurent updates on correlated items.
Everything displayed to the user should be treated as 'tentative'.
The user interface should be prepared to handle update conflicts.
The update logic should always follow the same hierachy (eg. if the agreed update logic is Account->User->Event->Booking then a rogue transaction trying to update Booking->Event->User will cause deadlocks).
And as a matter of fact there is an approach that limits these concerns: workflow processing backed by transactional queues that leverage correlated items exclusive lock out. Not your everyday ASP task for sure, so I'd recommend you stick with what you know.