Task Scheduling / Load Balance Pattern - c#

I've run into this a few times recently at work. Where we have to develop an application that completes a series of items on a schedule, sometimes this schedule is configurable by the end user, other times its set in Config File. Either way, this task is something that should only be executed once, by a single machine. This isnt generally difficult, until you introduce the need for SOA/Geo Redundancy. In this particular case there are a total of 4 (could be 400) instances of this application running. There are two in each data center on opposite sides of the US.
I'm investigating successful patterns for this sort of thing. My current solution has each physical location determining if it should be active or dormant. We do this by checking a Session object that is maintained to another server. If DataCenter A is the live setup, then the logic auto-magically prevents the instances in DataCenter B from performing any execution. (We dont want the work to traverse the MPLS between DCs)
The two remaining instances in DC A will then query the Database for any jobs that need to be executed in the next 3 hours and cache them. A separate timer runs every second checking for jobs that need executed.
If it finds one it will execute a stored procedure first, that forces a full table lock, queries for the job that needs to be executed, checks the "StartedByInstance" Column for a value, if it doesnt find a value then it marks that record as being executed by InstanceX. Only then will it actually execute the job.
My direct questions are:
Is this a good pattern?
Are there any better patterns?
Are there any libraries/apis that would be of interest?
Thanks!

Related

Scheduling methods to run in ASP.NET

Explanation:
I am developing a simple car business system and I have to implement the following feature:
A very special car model is delivered to a shop. There are a lot of people on waiting list exactly for this model.
When the car arrives the first client receives the right to buy it, he / she has 24 hours to use this opportunity.
I have a special state in the DB that determines if the user is: on waiting list (I have the exact position, as well) or can use opportunity to buy the car. Whenever the car arrives, I run a method that changes the state of the first client on waiting list. And here comes the problem:
Problem:
The client can use his opportunity, during the 24 hours period. But I have to check at the end, if he/she has bought the car. For this reason, I have to schedule a method to run in 24 hours.
Possible solution:
I am thinking about two things. First is using a job scheduler like Hangfire. The problem is that since I do not have any other jobs in my app, I do not want to include a whole package for such a small thing. Second is using making the checking method asynchronous and making the thread sleep for 24 hours before proceeding (I do not feel comfortable in working with threads and this is just an idea). I got the idea from this article. Keep in mind that more than one car can arrive in more than one shop. Does it mean that I should use many threads and how it is going to affect the performance of the system?
Question:
Which of the two solutions is better?
Is there another possibility that you can suggest in this particular case?
I agree. Importing a package for only one job if you aren't going to use it for many jobs is a little bit of overkill.
If you are running SQL server, I'd recommend writing a .NET console application to run on a schedule using the SQL Server Agent. (see image) If you have stored procedures that need to run, you also have the option to run them directly from the SQL job if for some reason you don't want to run them from your .NET application.
Since it sounds like you need this to run on a data driven schedule, you may consider adding a trigger to look for a new record in your database whenever that "special" car is inserted into the database. MSDN SQL Job using Trigger
I've done something similar to this where every morning, an hour prior to business hours starting, I run a .NET executable that checks the latest record in table A and compares it to a value in table B and determines if the record in table A needs to be updated.
I also use SQL Server to run jobs that send emails on a schedule based on data that has been added or modified in a database.
There are advantages to using SQL server to running your jobs as there are many options available to notify you of events, retry running failed jobs, logging and job history. You can specify any type of schedule from repeating frequently to only running once a week.

TransactionScope priority (get rid of deadlock situation)

I have process A that runs every 5 minutes and needs to write something in table "EventLog". This works all day long but at night there is another process B starting that needs to delete a lot of old data from this table. The table has millions of rows (blobs included) and many related tables (deletion by cascade) so process B runs up to ~45 minutes. While process B is running I get a lot of deadlock warnings for process A and I want to get rid of these.
The easy option would be "Don't run process A when process B is running" but there must be a better approach. I am using EntityFramework 6 and TransactionScope in both processes. I didn't find out how to set priority or something like that on my processes. Is this possible?
EDIT:
I forgot to say that I am already using one delete transaction per record, not one transaction for all records. Inside loop I create new DBContext and TransactionScope, so each record has its own transaction. My problem is that deleting a record still takes some time because of the related BLOBs and data in other related tables (lets say about 5 sec. per row). I still get deadlock situations when deleting process (B) crosses with inserting process (A).
Transactions don't have priority. Deadlock victims are chosen by the database, most commonly using things like "work required to roll it back". One way of avoiding a deadlock is to ensure that you block rather than deadlock, by accessing tables in the same order, and by taking locks at the eventual level (for example, taking an UPDLOCK when reading the data, to avoid two queries getting read locks, then one trying to escalate to a write lock). Ultimately, though, this is a tricky area - and something that takes 45M to complete (please tell me that isn't a single transaction!) is always going to cause problems.
Rework process B to not delete it all at once but in smaller batches that never take more than 1 minute. Run those in a loop until all to be deleted is done.

How do I initialize my Entity Framework queries to speed them up?

I have a login page that executes a very simple EF query to determine if a user is valid. On the first run this query takes about 6 seconds to run. On subsequent runs it takes much less than a second.
I've looked at an article that talked about using Application Auto-Start and my question is this: Is there a way to trigger this query to cause whatever caching needs to happen without actually calling the query, or is it necessary for me to just call the query with a dummy set of arguments?
EDIT: When I say six seconds I'm referring the time it takes to get the query. Code looks something like this (note in this case contactID is a nullable int and set to null):
return from contact in _context.Contacts
where contact.District == Environment.District &&
contact.ContactId == (contactID ?? contact.ContactId)
select contact;
This is a SqlServer 2008 and I've run a profiler to check the SQL and the duration it returns is 41ms for the query that ultimately gets executed. The 6 or 7 second delay happens before the query even reaches SQL though. I'm trying to setup glimpse now to see if it can give me more details on other things that may be going on at the same time.
This really sounds like what is called a "cold query". The main performance botteneck for cold queries is "View Generation" which is performed once per AppDomain of your application. Typically the effect is that your first query - and it doesn't matter which one - is slow and subsequent queries are fast.
It must not necessarily be a query that could be slow. If the first operation you are doing with EF in your application is an Insert that would be slow. Or even an Attach that doesn't touch the database at all would be slow as well. (That's a good simple test case by the way: Add a context.Users.Attach(new User()) into application start and watch in the debugger how long it takes to pass that line.)
In all cases the time is consumed by building an internal data structure in memory - the local query "views" (they have nothing to do with database table views) - that takes place once per AppDomain.
View Generation is described here in more detail and here where you also can find resources how to "pre-generate" those views as part of your build process and before deployment. (Note: You have to update these pre-generated every time your change your model and redeploy your application.)
The alternative way is to trigger loading your web application start periodically (by some process for example that hits the site). In application start you would run any dummy query or the Attach thing above or call the EF initialisation manually:
using (var context = new MyContext())
{
context.Database.Initialize(false);
}
Edit
I forgot the last solution. Just ignore the 6 or 7 seconds. If your site gets famous and has reasonable traffic such a cold query will become unlikely to occur because the IIS worker process will rarely shut down the AppDomain. The occasional user who hits the site in the night when such a shut down has happened right before is probably too tired to even notice the delay.

C# - Unflagging Database Values In The Event Of Console App Crash

I currently have a c# console app where multiple instances run at the same time. The app accesses values in a database and processes them. While a row is being processed it becomes flagged so that no other instance attempts to process it at the same time. My question is what is a efficient and graceful way to unflag those values in the event an instance of the program crashes? So if an instance crashed I would only want to unflag those values currently being processed by that instance of the program.
Thanks
The potential solution will depend heavily on how you start the console applications.
In our case, the applications are started based on configuration records in the database. When one of these applications performs a lock, it uses the primary key from the database configuration record to perform the lock.
When the application starts up, the first thing it does is release all locks on the records that it previously locked.
To control all of the child processes, we have a service that uses the information from the configuration tables to start the processes and then keeps an eye on them, restarting them when they fail.
Each of the processes is also responsible for updating a status table in the database with the last time it was available with a maximum allowed delay of 2 minutes (for heavy processing). This status table is used by sysadmins to watch for problems, but it could also be used to manually release locks in case of a repeating failure in a given process.
If you don't have a structured approach like this, it could be very difficult to automatically unlock records unless you have a solid profile of your application performance that would allow you to know that any lock over 5 minutes old is invalid because it should only take, on average, 15 seconds to process a record with a maximum of 2 minutes.
To be able to handle any kind of crash, even power off I would suggest to timestamp records additionally and after some reasonable timeout treat records as unlocked even if they are flagged.

How do I get two thread to insert specific timestamps into a table?

I created two (or more) threads to insert data in a table in database. When inserting, there is a field CreatedDateTime, that of course, stores the datetime of the record creation.
For one case, I want the threads to stay synchronized, so that their CreatedDateTime field will have exactly the same value. When testing with multi threading, usually I've got different milliseconds...
I want to test different scenarios in my system, such as:
1) conflicts inserting record exactly at the same time.
2) problems with ordering/selection of records.
3) Problems with database connection pooling.
4) Problems with multiple users (hundred) accessing at same time.
There may be other test cases I haven't listed here.
Yes, that's what happens. Even if by some freak of nature, your threads were to start at exactly the same time, they would soon get out of step simply because of resource contention between them (at a bare minimum, access to the DB table or DBMS server process).
If they stay mostly in step (i.e., never more than a few milliseconds out), just choose a different "resolution" for your CreatedDateTime field. Put it in to the nearest 10th of a second (or second) rather than millisecond. Or use fixed values in some other way.
Otherwise, just realize that this is perfectly normal behavior.
And, as pointed out by BC in a comment, you may misunderstand the use of the word "synchronized". It's used (in Java, I hope C# is similar) to ensure two threads don't access the same resource at the same time. In actuality, it almost guarantees that threads won't stay synchronized as you understand the term to mean (personally I think your definition is right in terms of English usage (things happening at the same time) but certain computer languages have suborned the definition for their own purposes).
If you're testing what happens when specific timestamps go into the database, you cannot rely on threads "behaving themselves" by being scheduled in a specific order and at specific times. You really need to dummy up the data somehow, otherwise it's like trying to nail jelly to a tree (or training a cat).
One solution is to not use things such as getCurrentTime() or now() but use a specific set of inserts which have known timestamps. Depending on your actual architecture, this may be difficult (for example, if you just call an API which itself gets the current timestamp to millisecond resolution).
If you control the actual SQL that's populating the timestamp column, you need to change that to use pre-calculated values rather the now() or its equivalents.
If you want to have the same timestamps on multiple rows being inserted; you should create a SQL thread which will do a multirow insert in one query which will allow you to get the same timestamps. Other than this, I agree with everyone else, you cannot get an exact timestamp at a huge resolution with multithreads unless you were to insert the timestamp as it is seen in the application and share that timestamp to be inserted. This of course, throws the caveat issues of threads out the window. It's like saying, I'm going to share this data, but I don't want to use mutexes because they stop the other thread from processing once it hits a lock().

Categories

Resources