Framework to handle recurring tasks - c#

We implemented a windows service that has a couple of timers in it. Over time the logic for the timers got more and more complicated. Its time to refactor our solution and one possible way would be to use a well documented framework that handles our requirements.
There are rules like:
start timer A each day at 9am
start timer B each 2min
if timer A is started dont start any other timer
timer C and D are not allowed to run at the same time
I looked at Quartz.net because it had the first 2 requirements of our list, but it doesnt handle any concurrency rules.
Is there any framework I could have a look at?

I had similar requirements: essentially what you need is a state machine that can be easily serialized to disk or a database, some way to specify the state machine easily using hierarchical states, some way to easily specify temporal events (After, Every, At) and some way to easily know when to load the state machine back into memory to advance state based on the current time.
In the end I wrote my own state machine as I didn't find one that met my requirements, in particular the temporal events and the serialization requirements. You can get the source code in a Nuget Package. Blog entry here. Feedback welcome.

Related

NServiceBus batching a long running job

I'm working on a project that is using NSB, really like it but it's my first NSB solution so a bit of a noob. We have a job that needs to run every day that processes members - it is not expected to take long as the work is simple, but will potentially effect thousands of members, and in the future, perhaps tens or hundreds of thousands.
Having it all happen in a single handler in one go feels wrong, but having a handler discover affected members and then fire separate events for each one sounds a bit too much in the opposite direction. I can think of a few other methods of doing it, but was wondering if there is an idiomatic way of dealing with this in NSB?
Edit to clarify: I'm using Schedule to send a command at 3am, the handler for that will query the SQL db for a list of members who need to be processed. Processing will involve updating/inserting one or two rows per member. My question is around how to process that potentially larege list of members within NSB.
Edit part 2: the job now needs to run monthly, not daily.
I would not use a saga for this. Sagas should be lightweight and are designed for orchestration rather than performing work. They are started by messages rather than scheduled.
You can achieve your ends by using the built-in scheduler. I've not used it, but it looks simple enough.
You could do something like:
configure a command message (eg StartJob) to be sent every day at 0300.
StartJob handler will then query the DB to get the work.
Then, depending on your requirements:
If you need all the work done at once, create a single command with all the work in it, and send it to another endpoint for processing. If you use transactional MSMQ then this will succeed or fail as a unit.
If you don't care if only some work succeeds then create a command per unit of work, and dispatch to an endpoint for processing. This has the benefit that you can scale out using the distributor if you needed to.
I'm working on a project that is using NSB...We have a job that needs
to run every day...
Although you can use NSB for this kind of work, it's not really something I would do. There are many other approaches you could use. A SQL job or cron job would be the obvious one (and a hell of a lot quicker to develop, more performant, and simpler).
Even though it does support such use cases, NServiceBus is not really designed for scheduled batch processing. I would seriously question whether you should even use NSB for this task.
You mention a running process and that sounds like a job for a Saga (see https://docs.particular.net/nservicebus/sagas/). You can use saga data and persist checkpoints in different storage mediums (SQL, Mongo etc). But yes, having something long running then dispatch messages from the Saga to individual handlers is definitely something I would do also.
Something else to consider is message deferral (Timeout Managers). So for example, lets say you process x number of users but want to run this again. NServiceBus allows you to defer messages for a defined period and the message will sit in the queue waiting to be dispatched.
Anymore info just shout and I can update my answer.
A real NSB solution would be to get rid of the "batch" job that processes all those records in one run and find out what action(s) would cause each of these records to need processing after all.
When such an action is performed you should publish an NSB event and refactor the batch job to a NSB handler that subscribes to these events so it can do the processing the moment the action is performed, running in parallel with the rest of your proces.
This way there would be no need anymore for a scheduled 'start' message at 3 am, because all the work would already have been done.
Here is how I might model this idiomatically with NServiceBus: there might be a saga called PointsExpirationPolicy, which would be initiated at the moment that any points are awarded to a user. The saga would store the user ID, and number of points awarded, and also calculate the date/time the points should expire. Then it would request a timeout callback message to be sent at the date/time these points should expire. When that callback arrives, the saga sends a command to expire that number of points from the user's account. This would also give you some flexibility around the logic of exactly when and how points expire, and would eliminate the whole batch process.

Patterns for Time / Date based domain events in DDD

I'm working on a user story whereby a Task (an entity) is created for a user to work on when a date is overdue and other criteria are met (on a separate entity - let's say a Product).
Ideally I would like a Domain Event to be created in real-time when this "Date" is overdue - however there isn't any trigger I can use in code to do this. I can only really see one type of pattern to use at the moment - that is to have a windows service which is polling every hour (using Topshelf / Quartz for example), pulling back all the records using the Product repository then code to check whether or not the dates are overdue and the criteria are met. If successful, the Domain Event will be triggered and the Task will be created.
As you can imagine, I don't particularly like this. It's not in real-time, and I'm pulling back a lot of data to achieve something relatively simple. Am I missing a trick here? Some kind of state machine / workflow? What architectural patterns / good designs are available for me to utilize in this situation?
Apologies if the question is a little vague, and I'll attempt to clarify if needs be.
If you use a sophisticated scheduler like Quartz anyway, why not just use it to call back to your application at the exact time the Task gets overdue? I've never used Quartz in this way, but I think this should be possible.
To get a robust solution, you might want to consider checking regularly in addition to the on-time callbacks, but I'd expect that these regular checks can run on a low frequency basis.
In any case, when you receive a callback, you need to check which tasks are really overdue. If polling the DB for this check is a performance problem (which I wouldn't expect it is in most cases), you can always cache the upcoming deadlines. Make sure you refresh the cache appropriately, e.g. by listening to the "task published" domain events.

Semaphore vs. SQL-Job when trying to remove expired SQL records

I'm using ASP.NET and C# to build some 'Social Network' web site,
while adding posts there are to SQL columns that i fill, the date and time when the post was added, and the date and time when the post is expired (It varies between all kind of posts..)
I want some process that constantly checks the SQL database and remove posts with expired date and time.
I've searched for solution and i understand that the 2 most suitable solutions are Semaphores and SQL Jobs (Correct me if i'm wrong).
I hope you could give me a hint about what's the best solution, if it's not one of the two what is it, and some info about the best solution as well..
Thanks!
Just hide posts that have expired based on the current time. For example
WHERE ExpiryDateTime > SYSUTCDATE()
Then you can clean old posts in the background at any frequency you like. Create a Windows Task Scheduler task that calls a special URL of your website. That URL should perform a database cleanup. This is a very simple and clearly correct solution.
If you don't like Windows Task Scheduler (and who really does like it...) you can use a scheduler lib such as Hangfire or Quartz.Net.
Neither.
A semaphore is a way of controlling resource use between multiple threads
An SQL job is a somewhat blunt tool designed to allow db admins to schedule tasks
I would create a separate program 'oldDataDeleter' code up your logic about what you want to delete or archive after how much time and then apply that logic in an atomic way. I would run this as a windows service with a timer, or a console app as a scheduled task
The key is to ensure that the program can run concurrently with itself and only does small atomic changes on a small chunk of data at a time.
You can then fire up multiple instances of this program running at a high frequency.
This ensures don't lock your database with large 'delete all from table X join table Y' statements and that your data is constantly trimmed rather than building up a big overnight job to run.
Edit for 'all code must be in a single website project' restriction
There is another solution which in some ways is better and works with your (slightly odd and very much not best practice) requirement.
That is to delete old entries whenever you make an insertion. So when you code adds a Post "insert into posts..." it also runs the delete "delete from posts where.."
This ensures that your program is self maintaining. However, you do incur a performance hit when adding posts. Given that a large social media site would be continually adding posts and needs to scale with its users. I don't recommend this solution.
However for small projects which don't need to scale it is neater.

How to prevent NHibernate long-running process from locking up web site?

I have an NHibernate MVC application that is using ReadCommitted Isolation.
On the site, there is a certain process that the user could initiate, and depending on the input, may take several minutes. This is because the session is per request and is open that entire time.
But while that runs, no other user can access the site (they can try, but their request won't go through unless the long-running thing is finished)
What's more, I also have a need to have a console app that also performs this long running function while connecting to the same database. It is causing the same issue.
I'm not sure what part of my setup is wrong, any feedback would be appreciated.
NHibernate is set up with fluent configuration and StructureMap.
Isolation level is set as ReadCommitted.
The session factory lifecycle is HybridLifeCycle (which on the web should be Session per request, but on the win console app would be ThreadLocal)
It sounds like your requests are waiting on database locks. Your options are really:
Break the long running process into a series of smaller transactions.
Use ReadUncommitted isolation level most of the time (this is appropriate in a lot of use cases).
Judicious use of Snapshot isolation level (Assuming you're using MS-SQL 2005 or later).
(N.B. I'm assuming the long-running function does a lot of reads/writes and the requests being blocked are primarily doing reads.)
As has been suggested, breaking your process down into multiple smaller transactions will probably be the solution.
I would suggest looking at something like Rhino Service Bus or NServiceBus (my preference is Rhino Service Bus - I find it much simpler to work with personally). What that allows you to do is separate the functionality down into small chunks, but maintain the transactional nature. Essentially with a service bus, you send a message to initiate a piece of work, the piece of work will be enlisted in a distributed transaction along with receiving the message, so if something goes wrong, the message will not just disappear, leaving your system in a potentially inconsistent state.
Depending on what you need to do, you could send an initial message to start the processing, and then after each step, send a new message to initiate the next step. This can really help to break down the transactions into much smaller pieces of work (and simplify the code). The two service buses I mentioned (there is also Mass Transit), also have things like retries built in, and error handling, so that if something goes wrong, the message ends up in an error queue and you can investigate what went wrong, hopefully fix it, and reprocess the message, thus ensuring your system remains consistent.
Of course whether this is necessary depends on the requirements of your system :)
Another, but more complex solution would be:
You build a background robot application which runs on one of the machines
this background worker robot can be receive "worker jobs" (the one initiated by the user)
then, the robot processes the jobs step & step in the background
Pitfalls are:
- you have to programm this robot very stable
- you need to watch the robot somehow
Sure, this is involves more work - on the flip side you will have the option to integrate more job-types, enabling your system to process different things in the background.
I think the design of your application /SQL statements has a problem , unless you are facebook I dont think any process it should take all this time , it is better to review your design and check where is the bottleneck are, instead of trying to make this long running process continue .
also some times ORM is not good for every scenario , did you try to use SP ?

What C# tools exist for triggering, queueing, prioritizing dependent tasks

I have a C# service application which interacts with a database. It was recently migrated from .NET 2.0 to .NET 4.0 so there are plenty of new tools we could use.
I'm looking for pointers to programming approaches or tools/libraries to handle defining tasks, configuring which tasks they depend on, queueing, prioritizing, cancelling, etc.
There are various types of services:
Data (for retrieving and updating)
Calculation (populate some table with the results of a calculation on the data)
Reporting
These services often depend on one another and are triggered on demand, i.e., a Reporting task, will probably have code within it such as
if (IsSomeDependentCalculationRequired())
PerformDependentCalculation(); // which may trigger further calculations
GenerateRequestedReport();
Also, any Data modification is likely to set the Required flag on some of the Calculation or Reporting services, (so the report could be out of date before it's finished generating). The tasks vary in length from a few seconds to a couple of minutes and are performed within transactions.
This has worked OK up until now, but it is not scaling well. There are fundamental design problems and I am looking to rewrite this part of the code. For instance, if two users request the same report at similar times, the dependent tasks will be executed twice. Also, there's currently no way to cancel a task in progress. It's hard to maintain the dependent tasks, etc..
I'm NOT looking for suggestions on how to implement a fix. Rather I'm looking for pointers to what tools/libraries I would be using for this sort of requirement if I were starting in .NET 4 from scratch. Would this be a good candidate for Windows Workflow? Is this what Futures are for? Are there any other libraries I should look at or books or blog posts I should read?
Edit: What about Rx Reactive Extensions?
I don't think your requirements fit into any of the built-in stuff. Your requirements are too specific for that.
I'd recommend that you build a task queueing infrastructure around a SQL database. Your tasks are pretty long-running (seconds) so you don't need particularly high throughput in the task scheduler. This means you won't encounter performance hurdles. It will actually be a pretty manageable task from a programming perspective.
Probably you should build a windows service or some other process that is continuously polling the database for new tasks or requests. This service can then enforce arbitrary rules on the requested tasks. For example it can detect that a reporting task is already running and not schedule a new computation.
My main point is that your requirements are that specific that you need to use C# code to encode them. You cannot make an existing tool fit your needs. You need the turing completeness of a programming language to do this yourself.
Edit: You should probably separate a task-request from a task-execution. This allows multiple parties to request a refresh of some reports while at the same time only one actual computation is running. Once this single computation is completed all task-requests are marked as completed. When a request is cancelled the execution does not need to be cancelled. Only when the last request is cancelled the task-execution is cancelled as well.
Edit 2: I don't think workflows are the solution. Workflows usually operate separately from each other. But you don't want that. You want to have rules which span multiple tasks/workflows. You would be working against the system with a workflow based model.
Edit 3: A few words about the TPL (Task Parallel Library). You mentioned it ("Futures"). If you want some inspiration on how tasks could work together, how dependencies could be created and how tasks could be composed, look at the Task Parallel Library (in particular the Task and TaskFactory classes). You will find some nice design patterns there because it is very well designed. Here is how you model a sequence of tasks: You call Task.ContinueWith which will register a continuation function as a new task. Here is how you model dependencies: TaskFactory.WhenAll(Task[]) starts a task that only runs when all its input tasks are completed.
BUT: The TPL itself is probably not well suited for you because its task cannot be saved to disk. When you reboot your server or deploy new code, all existing tasks are being cancelled and the process aborted. This is likely to be unacceptable. Please just use the TPL as inspiration. Learn from it what a "task/future" is and how they can be composed. Then implement your own form of tasks.
Does this help?
I would try to use the state machine package stateless to model the workflow. Using a package will provide a consistent way to advance the state of the workflow, across the various services. Each of your services would hold an internal statemachine implementation, and expose methods for advancing it. Stateless will be resposible for triggering actions based on the state of the workflow, and enforce you to explicitly setup the various states that it can be in - this will be particularly useful for maintenance, and it will probably help you understand the domain better.
If you want to solve this fundamental problem properly and in a scalable way, you should probably look as SOA architecture style.
Your services will receive commands and generate events you can handle in order to react on facts happen in your system.
And, yes, there are tools for it. For example NServiceBus is a wonderful tool to build SOA systems.
You can do a SQL data agent to run SQL queries in timed interval. You have to write the application yourself it looks like. Write like a long running program that checks the time and does something. I don't think there is clearcut tools out there to do what you are trying to do. Do C# application, WCF service. data automation can be done in the sql itself.
If I understand you right you want to cache the generated reports and do not the work again. As other commenters have pointed out this can be solved elegantly with multiple Producer/Consumer queues and some caches.
First you enqueue your Report request. Based on the report genration parameters you can check the cache first if a previously generated report is already available and simply return this one. If due to changes in the database the report becomes obsolete you need to take care that the cache is invalidated in a reliable manner.
Now if the report was not generated yet you need need to schedule the report for generation. The report scheduler needs to check if the same report is already beeing generated. If yes register an event to notify you when it is completed and return the report once it is finished. Make sure that you do not access the data via the caching layer since it could produce races (report is generated, data is changed and the finished report would be immediatly discared by the cache leaving noting for you to return).
Or if you do want to prevent to return outdated reports you can let the caching layer become your main data provider which will produce as many reports until one report is generated in time which was not outdated. But be aware that if you have constant changes in your database you might enter an endless loop here by constantly generating invalid reports if the report generation time is longer as the average time between to changes to your db.
As you can see you have plenty of options here without actually talking about .NET, TPL, SQL server. First you need to set your goals how fast/scalable and reliable your system should be then you need to choose the appropriate architecture-design as described above for your particular problem domain. I cannot do it for you because I do not have your full domain know how what is acceptable and what not.
The tricky part is the handover part between different queues with the proper reliability and correctness guarantees. Depending on your specific report generation needs you can put this logic into the cloud or use a single thread by putting all work into the proper queues and work on them concurrently or one by one or something in between.
TPL and SQL server can help there for sure but they are only tools. If used wrongly due to not sufficient experience with the one or the other it might turn out that a different approach (like the usage of only in memory queues and persisted reports on in the file system) is better suited for your problem.
From my current understanding I would not use SQL server to misuse it as a cache but if you want a database I would use something like RavenDB or RaportDB which look stable and much more light weight compared to a full blown SQL server.
But if you already have a SQL server running then go ahead and use it.
I am not sure if I understood you correctly, but you might want to have a look at JAMS Scheduler: http://www.jamsscheduler.com/. It's non-free, but a very good system for scheduling depending tasks and reporting. I have used it with success at my previous company. It's written in .NET and there is a .NET API for it, so you can write your own apps communicating with JAMS. They also have a very good support and are eager to implement new features.

Categories

Resources