Alternative to WorkflowServiceHost for WF4?

Alternative to WorkflowServiceHost for WF4? - c#

We're looking to replace a chunk of our business logic with WF4 workflows.
They're all pretty typical workflows: User action creates an instance, database effort, next user confirms, etc.
Our requirements for the workflow host are:
Create workflows from XAML definitions stored in a database (DynamicActivity)
Support workflows on different versions
Support long time-based events (we're currently aware of notifications after 5 days and rolling back a workflow after 30 days)
Support many instances of many workflows (we've identified 10 workflows with about 4000 in-flight, of which only a few are processing at any one time)
Retain all state after a service restart (including the time-based event)
Authenticate the calling user (WindowsAuthentication, if possible)
As part of the migration effort, I built some POCs using "WCF Workflow Service Application" projects, but from what I can see these aren't immediately possible.
I've got that #2 is done through WCF Routing, and my understanding is that WSH will handle #3 for us (is this true, given #5?), but I can't see how #1 would work from the default project structure.
I've solved #1 using WorkflowApplication instances, but this relied on using bookmarks to resume for each input event, and I wasn't convinced that WorkflowApplication would scale to our needs without unloading idle workflows, which breaks the Delay activity.
So, if you've stuck with me this far:
Is there a way to achieve all of this using WSH, either in the default project or by implementing some of it ourselves?
Are we better off writing our own "DurableDelay" activity that records the true wake time and unloads the workflow to be resumed by the host process, given the long durations and potential need to unload and reload workflows?
If WSH isn't going to do it, is there an existing alternative?
I'm not averse to writing our own host service to handle workflow lifecycle, and we've even drawn up the proposed design, but I didn't want to start down that route if it turns out that there is a ready-made solution.
Cheers

You can achieve #1 by using a VirtualPathProvider to load your workflows from a database instead of the file system. See How To Build Workflow Services with a Database Repository for more information about this.
Workflow versioning (#2) is something that is not supported in .NET 4.0, but in .NET 4.5 you have better support for real versioning. See What's New in Windows Workflow Foundation 4.5. However, if you don't need to change a workflow after it starts and just need that new instances start with a new version while already executing instances can finish using a previous workflow definition then you can implement versioning at the database level and just treat each workflow definition version has a different workflow service.
You can then use Workflow Services hosted in IIS (AppFabric) with a SQL Server instance store to get #3, #4 and #5 almost for free.
Finally for #6 and assuming you stick with .NET 4.0 you can take a look at WF Security Pack CTP 1.

I'm developing the same kind of workflows.
I also first gave a look to workflow services, but as our workflow were completely integrated in a business layer, I didn't want to use WCF to access workflows.
So I'm now using WorkflowApplication as host, so I can instantiate and manipulate the host.
Biggest problem was resuming workflows that use a delay activity (you need to check yourself in the database)

Related

Best practices to implement dotnet core web api with the option to use horizontal scaling in cloud

We have created a dotnet core web api project which is using SQL Server database. Now, we are planning to deploy this project to Microsoft Azure.
While the deployment of this application, we are also considering to enable autoscaling option (horizontal scaling).
Before, we do it. We want to have some questions that we want to clarify.
Should we need to add some additional code in our application which allows autoscaling to work properly?
Properly in a sense, as there can be more than one instance of the application running because of horizontal scaling. We are using database and more than one instance is running will it case race condition (i.e., two resources accessing the same data at a time). I mean we can add a transaction (or use locking) in our code to avoid these kinds of scenarios?
I want to know that is there any best practices to follow while implementing that kind of application?
Thank you and waiting for your answers!

Consider the following points when designing an autoscaling strategy:
The system must be designed to be horizontally scalable. Avoid making
assumptions about instance affinity; do not design solutions that
require that the code is always running in a specific instance of a
process. When scaling a cloud service or web site horizontally, do
not assume that a series of requests from the same source will always
be routed to the same instance. For the same reason, design services
to be stateless to avoid requiring a series of requests from an
application to always be routed to the same instance of a service.
When designing a service that reads messages from a queue and
processes them, do not make any assumptions about which instance of
the service handles a specific message because autoscaling could
start additional instances of a service as the queue length grows.
The Competing Consumers pattern describes how to handle this
scenario.
If the solution implements a long-running task, design this task to
support both scaling out and scaling in. Without due care, such a
task could prevent an instance of a process from being shutdown
cleanly when the system scales in, or it could lose data if the
process is forcibly terminated. Ideally, refactor a long-running task
and break up the processing that it performs into smaller, discrete
chunks. The Pipes and Filters pattern provides an example of how you
can achieve this. Alternatively, you can implement a checkpoint
mechanism that records state information about the task at regular
intervals, and save this state in durable storage that can be
accessed by any instance of the process running the task. In this
way, if the process is shutdown, the work that it was performing can
be resumed from the last checkpoint by using another instance.
For more information, follow the doc : https://github.com/Huachao/azure-content/blob/master/articles/best-practices-auto-scaling.md

Regarding this:
Properly in a sense, as there can be more than one instance of the application running because of horizontal scaling. We are using database and more than one instance is running will it case race condition (i.e., two resources accessing the same data at a time). I mean we can add a transaction (or use locking) in our code to avoid these kinds of scenarios?
Please keep in mind that, even if the app is running on a single machine, requests will still be handled concurrently. This means that even on a single machine 2 requests can cause the same entry in the database to be updated. So the above questions about race conditions apply to single instance web apps as well.
Try to avoid locking: the whole point of (horizontal) scaling is to gain performance benefits. By using locks you effectively remove this benefits as only one process at a time can use the locked resource.
Other points of considerations are:
If you are using an in-memory cache you might want to swap it out for a distributed cache.
The guidance at the MS docs

Diffrence between windows services running own processes vs shared process

I am working on project where we have decided to split our background tasks (network, CPU and IO intensive) into three windows services.
Now the question is, whether we should host all three services into a single process or create three independent services with their own processes.
Windows Service project template allows multiple services to be created, when installed they'll create separate entries in Service Control Manager (SCM) and can be controlled independently. The benefit here is better code management and code reuse.
However, if there is any performance drawback, which is the primary reason why we're having multiple services in the first place, I'd rather let go this benefit.
Please advise.

My suggestion is to go for Seperarte windows services created using topshelf or other technology hence they are independent of paltform
Scalability easly scalable as per need ,if one service is being used more then other, then that one service can be scale up by running multiple instance of same.
parallel processing as services are independent they can work in prallel hence performance improved.

Windows Service + Windows Forms application. One database. Advice on concurrency

I have SQL Server database with information for files - I'm talking about custom properties. These are categories and description for each file.
The Windows Forms application is for the user. But I will also make a Windows Service that will track any changes with the files. If a change happens(renamed,moved,deleted) the service has to update that same database accordingly. And I think it should do it right away, without any delay.
Now this is going to be my first time making WS plus the first time I will have to handle concurrency (theoretically I know about threads and so on).
So:
First of all, is it OK if one process is updating a database another process may be using at the same time? Do you need to handle that situation on the first place? (Probably, fx in our daily "user lives" we can't modify a file when it's being used by another process)
Is the idea these two to share one data source good ?
If it is, then how to handle the concurrency ? I can use WCF for the messages between the two, but then does the solution have something to do with WCF ? Because I'm going to use this for the first time as well :D.
Any help is appreciated. Thanks in advance for the time !

Since MS SQL is transactional there will be no big deal. You just have to watch out for data wich might be read and updated by one process - there it can be neccessary to use a Transaction scope (that's a .NET Class ;)).
From the Software architectural Point of view you should conside using a three-tier and not a two-tier application:
Two Tier:
Essentially your System with the persistance-layer (DB) communicating with the Clients directly
Three Tier:
Persistance-Layer <--> Logic-Layer (e.g. a WCF-Service handling the app logic) <--> Clients (Service and Forms - triggering app logic and showing results)

When it comes to concurrency it's going to be really straight forward. The MSSQL database engine handles just about all of it (e.g. locking and sharing). Further, if you leverage the SqlCommandBuilder to build your statements, the statements will automatically use optimistic concurrency.
As for the Windows service and how it gets notified, use a FileSystemWatcher, it going to be more efficient and you won't be published some service port on the local box.
I'd normally give you some good code examples but I'm answering this from my phone.

What C# tools exist for triggering, queueing, prioritizing dependent tasks

I have a C# service application which interacts with a database. It was recently migrated from .NET 2.0 to .NET 4.0 so there are plenty of new tools we could use.
I'm looking for pointers to programming approaches or tools/libraries to handle defining tasks, configuring which tasks they depend on, queueing, prioritizing, cancelling, etc.
There are various types of services:
Data (for retrieving and updating)
Calculation (populate some table with the results of a calculation on the data)
Reporting
These services often depend on one another and are triggered on demand, i.e., a Reporting task, will probably have code within it such as
if (IsSomeDependentCalculationRequired())
PerformDependentCalculation(); // which may trigger further calculations
GenerateRequestedReport();
Also, any Data modification is likely to set the Required flag on some of the Calculation or Reporting services, (so the report could be out of date before it's finished generating). The tasks vary in length from a few seconds to a couple of minutes and are performed within transactions.
This has worked OK up until now, but it is not scaling well. There are fundamental design problems and I am looking to rewrite this part of the code. For instance, if two users request the same report at similar times, the dependent tasks will be executed twice. Also, there's currently no way to cancel a task in progress. It's hard to maintain the dependent tasks, etc..
I'm NOT looking for suggestions on how to implement a fix. Rather I'm looking for pointers to what tools/libraries I would be using for this sort of requirement if I were starting in .NET 4 from scratch. Would this be a good candidate for Windows Workflow? Is this what Futures are for? Are there any other libraries I should look at or books or blog posts I should read?
Edit: What about Rx Reactive Extensions?

I don't think your requirements fit into any of the built-in stuff. Your requirements are too specific for that.
I'd recommend that you build a task queueing infrastructure around a SQL database. Your tasks are pretty long-running (seconds) so you don't need particularly high throughput in the task scheduler. This means you won't encounter performance hurdles. It will actually be a pretty manageable task from a programming perspective.
Probably you should build a windows service or some other process that is continuously polling the database for new tasks or requests. This service can then enforce arbitrary rules on the requested tasks. For example it can detect that a reporting task is already running and not schedule a new computation.
My main point is that your requirements are that specific that you need to use C# code to encode them. You cannot make an existing tool fit your needs. You need the turing completeness of a programming language to do this yourself.
Edit: You should probably separate a task-request from a task-execution. This allows multiple parties to request a refresh of some reports while at the same time only one actual computation is running. Once this single computation is completed all task-requests are marked as completed. When a request is cancelled the execution does not need to be cancelled. Only when the last request is cancelled the task-execution is cancelled as well.
Edit 2: I don't think workflows are the solution. Workflows usually operate separately from each other. But you don't want that. You want to have rules which span multiple tasks/workflows. You would be working against the system with a workflow based model.
Edit 3: A few words about the TPL (Task Parallel Library). You mentioned it ("Futures"). If you want some inspiration on how tasks could work together, how dependencies could be created and how tasks could be composed, look at the Task Parallel Library (in particular the Task and TaskFactory classes). You will find some nice design patterns there because it is very well designed. Here is how you model a sequence of tasks: You call Task.ContinueWith which will register a continuation function as a new task. Here is how you model dependencies: TaskFactory.WhenAll(Task[]) starts a task that only runs when all its input tasks are completed.
BUT: The TPL itself is probably not well suited for you because its task cannot be saved to disk. When you reboot your server or deploy new code, all existing tasks are being cancelled and the process aborted. This is likely to be unacceptable. Please just use the TPL as inspiration. Learn from it what a "task/future" is and how they can be composed. Then implement your own form of tasks.
Does this help?

I would try to use the state machine package stateless to model the workflow. Using a package will provide a consistent way to advance the state of the workflow, across the various services. Each of your services would hold an internal statemachine implementation, and expose methods for advancing it. Stateless will be resposible for triggering actions based on the state of the workflow, and enforce you to explicitly setup the various states that it can be in - this will be particularly useful for maintenance, and it will probably help you understand the domain better.

If you want to solve this fundamental problem properly and in a scalable way, you should probably look as SOA architecture style.
Your services will receive commands and generate events you can handle in order to react on facts happen in your system.
And, yes, there are tools for it. For example NServiceBus is a wonderful tool to build SOA systems.

You can do a SQL data agent to run SQL queries in timed interval. You have to write the application yourself it looks like. Write like a long running program that checks the time and does something. I don't think there is clearcut tools out there to do what you are trying to do. Do C# application, WCF service. data automation can be done in the sql itself.

If I understand you right you want to cache the generated reports and do not the work again. As other commenters have pointed out this can be solved elegantly with multiple Producer/Consumer queues and some caches.
First you enqueue your Report request. Based on the report genration parameters you can check the cache first if a previously generated report is already available and simply return this one. If due to changes in the database the report becomes obsolete you need to take care that the cache is invalidated in a reliable manner.
Now if the report was not generated yet you need need to schedule the report for generation. The report scheduler needs to check if the same report is already beeing generated. If yes register an event to notify you when it is completed and return the report once it is finished. Make sure that you do not access the data via the caching layer since it could produce races (report is generated, data is changed and the finished report would be immediatly discared by the cache leaving noting for you to return).
Or if you do want to prevent to return outdated reports you can let the caching layer become your main data provider which will produce as many reports until one report is generated in time which was not outdated. But be aware that if you have constant changes in your database you might enter an endless loop here by constantly generating invalid reports if the report generation time is longer as the average time between to changes to your db.
As you can see you have plenty of options here without actually talking about .NET, TPL, SQL server. First you need to set your goals how fast/scalable and reliable your system should be then you need to choose the appropriate architecture-design as described above for your particular problem domain. I cannot do it for you because I do not have your full domain know how what is acceptable and what not.
The tricky part is the handover part between different queues with the proper reliability and correctness guarantees. Depending on your specific report generation needs you can put this logic into the cloud or use a single thread by putting all work into the proper queues and work on them concurrently or one by one or something in between.
TPL and SQL server can help there for sure but they are only tools. If used wrongly due to not sufficient experience with the one or the other it might turn out that a different approach (like the usage of only in memory queues and persisted reports on in the file system) is better suited for your problem.
From my current understanding I would not use SQL server to misuse it as a cache but if you want a database I would use something like RavenDB or RaportDB which look stable and much more light weight compared to a full blown SQL server.
But if you already have a SQL server running then go ahead and use it.

I am not sure if I understood you correctly, but you might want to have a look at JAMS Scheduler: http://www.jamsscheduler.com/. It's non-free, but a very good system for scheduling depending tasks and reporting. I have used it with success at my previous company. It's written in .NET and there is a .NET API for it, so you can write your own apps communicating with JAMS. They also have a very good support and are eager to implement new features.

C# scheduler tools

I’m loath to ask another scheduler question here, I’ve read through dozens, but it’s still not clear to me what tools would best fit my need. I have three requirements for a reporting app:
User invoked
fixed scheduled
user scheduled.
I have an ASP.NET forms app to cover #1 and a C# console app to handle #2 but now #3 has been added to the mix.
So for the user scheduled reports I need to:
Present the user with a schedule selector and save their selection (into SQL Server?)
Have an app that checks the database for jobs to run/schedule
App to run the query and format the report
I suppose the latter two could be a single app but I’ve read it’s hard to debug service apps so keeping them separate may be good. I don’t know what parts of my requirements are met by Quartz.net and I’ve seen separate GUI tools (DayPilot) and backend (Task Manager API, CodePlex taskscheduler) mentioned. Not having used any of these I’m hoping to minimize my false starts.

If you require job scheduler try using hangfire.io

If you have SQL Server, then you should use SQL Server Reporting Services, which does all three.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.