Hangfire causing locks in SQL Server - c#

We are using Hangfire 1.7.2 within our ASP.NET Web project with SQL Server 2016. We have around 150 sites on our server, with each site using Hangfire 1.7.2. We noticed that when we upgraded these sites to use Hangfire, the DB server collapsed. Checking the DB logs, we found out there were multiple locking queries. We have identified one RPC Event “sys.sp_getapplock;1” In the all blocking sessions. It seems like Hangfire is locking our DB rendering whole DB unusable. We noticed almost 670+ locking queries because of Hangfire.
This could possibly be due to these properties we setup:
SlidingInvisibilityTimeout = TimeSpan.FromMinutes(30),
QueuePollInterval = TimeSpan.FromHours(5)
Each site has around 20 background jobs, a few of them run every minute, whereas others every hour, every 6 hours and some once a day.
I have searched the documentation but could not find anything which could explain these two properties or how to set them to avoid DB locks.
Looking for some help on this.
EDIT: The following queries are executed at every second:
exec sp_executesql N'select count(*) from [HangFire].[Set] with (readcommittedlock, forceseek) where [Key] = #key',N'#key nvarchar(4000)',#key=N'retries'
select distinct(Queue) from [HangFire].JobQueue with (nolock)
exec sp_executesql N'select count(*) from [HangFire].[Set] with (readcommittedlock, forceseek) where [Key] = #key',N'#key nvarchar(4000)',#key=N'retries'
irrespective of various combinations of timespan values we set. Here is the code of GetHangfirServers we are using:
public static IEnumerable<IDisposable> GetHangfireServers()
{
// Reference for GlobalConfiguration.Configuration: http://docs.hangfire.io/en/latest/getting-started/index.html
// Reference for UseSqlServerStorage: http://docs.hangfire.io/en/latest/configuration/using-sql-server.html#configuring-the-polling-interval
GlobalConfiguration.Configuration
.SetDataCompatibilityLevel(CompatibilityLevel.Version_170)
.UseSimpleAssemblyNameTypeSerializer()
.UseRecommendedSerializerSettings()
.UseSqlServerStorage(ConfigurationManager.ConnectionStrings["abc"]
.ConnectionString, new SqlServerStorageOptions
{
CommandBatchMaxTimeout = TimeSpan.FromMinutes(5),
SlidingInvisibilityTimeout = TimeSpan.FromMinutes(30),
QueuePollInterval = TimeSpan.FromHours(5), // Hangfire will poll after 5 hrs to check failed jobs.
UseRecommendedIsolationLevel = true,
UsePageLocksOnDequeue = true,
DisableGlobalLocks = true
});
// Reference: https://docs.hangfire.io/en/latest/background-processing/configuring-degree-of-parallelism.html
var options = new BackgroundJobServerOptions
{
WorkerCount = 5
};
var server = new BackgroundJobServer(options);
yield return server;
}
The worker count is set just to 5.
There are just 4 jobs and even those are completed (SELECT * FROM [HangFire].[State]):
Do you have any idea why the Hangfire is hitting so many queries at each second?

We faced this issue in one of our projects. The hangfire dashboard is pretty read heavy and it polls the hangfire db very frequently to refresh job status.
Best solution that worked for us was to have a dedicated hangfire database.
That way you will isolate the application queries from hangfire queries and your application queries won't be affected by the hangfire server and dashboard queries.

There is a newer configuration option called SlidingInvisibilityTimeout when configuring SqlServerStorage that causes these database locks as part of newer fetching non-transactional message fetching algorithm. It is meant for long running jobs that may cause backups of transactional logs to error out (as there is a database transaction that is still active as part of the long running job).
.UseSqlServerStorage(
"connection_string",
new SqlServerStorageOptions { SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5) });
Our DBA did not like the database locks, so I just removed this SlidingInvisibilityTimeout option to use the old transactional based message fetching algorithm since I didn't have any long running jobs in my queue.
Whether you enable this option or not is dependent on your situation. You may want to consider moving your queue database outside of your application database if it isn't already and enable the SlidingInvisibilityTimeout option. If your DBA can't live with the locks even if the queue is a separate database, then maybe you could refactor your tasks into many more smaller tasks that are shorter lived. Just some ideas.
https://www.hangfire.io/blog/2017/06/16/hangfire-1.6.14.html

SqlServerStorage runs Install.sql which takes an exclusive schema lock on the Hangfire-schema.
DECLARE #SchemaLockResult INT;
EXEC #SchemaLockResult = sp_getapplock #Resource = '$(HangFireSchema):SchemaLock',
#LockMode = 'Exclusive'
From the Hangfire documentation:
"SQL Server objects are installed automatically from the SqlServerStorage constructor by executing statements
described in the Install.sql file (which is located under the tools folder in the NuGet package). Which contains
the migration script, so new versions of Hangfire with schema changes can be installed seamlessly, without your
intervention."
If you don't want to run this script everytime you could set SqlServerStorageOptions.PrepareSchemaIfNecessary to false.
var options = new SqlServerStorageOptions
{
PrepareSchemaIfNecessary = false
};
var sqlServerStorage = new SqlServerStorage(connectionstring, options);
Instead run the Install.sql manually by using this line:
SqlServerObjectsInstaller.Install(connection);

Related

Trigger specific Quartz jobs

I am using Quartz to schedule jobs and use a console application to execute all the jobs.
I currently have 2 console applications which refer to the same set of Quartz tables viz. QRTZ_JOB_DETAILS, QRTZ_TRIGGERS etc.
Due to this, when I execute ConsoleApp1 which doesn't have jobs (created in ConsoleApp2), I get the following error:
XYZ job: Couldn't retrieve job because a required type was not found: Could not load type 'XYZ-Job, XYZ.Job' ---> Quartz.JobPersistenceException:
I have checked the solution here.
Obvious solution is to create separate Quartz table-sets for each console application. That way, I won't get any load type errors.
My question is, in such a scenario, is there a way to get only particular jobs (based on some match), so that I don't need to create 2 table sets.
In the below code, I was thinking if I get all the job names, I will disable the triggers for ConsoleApp2. But then, ConsoleApp2 won't have any jobs to run! (this is because, the tables are same)
Please let me know if there is a better solution.
protected async void StartScheduler1()
{
ISchedulerFactory schedFact = container.ResolveType<ISchedulerFactory>();
var schedTask = schedFact.GetScheduler();
schedTask.Wait();
scheduler = schedTask.Result;
var jobs = new List<JobKey>();
foreach (var group in scheduler.GetJobGroupNames().Result)
{
var groupMatcher = GroupMatcher<JobKey>.GroupContains(#group);
foreach (var jobKey in scheduler.GetJobKeys(groupMatcher).Result)
{
jobs.Add(jobKey);
}
}
scheduler.Start().Wait();
}
Finally found a solution. There is a column called Sched_Name in Quartz tables. This column is used by Quartz scheduler to get job details.
Using this column, we can have numerous different groups in the same Quartz tables. There is no need create separate Quartz table-sets.
For e.g.
SELECT * FROM QRTZ_JOB_DETAILS WHERE SCHED_NAME = 'CESA'
SELECT * FROM QRTZ_JOB_DETAILS WHERE SCHED_NAME = 'CESB'

Functions are slow when querying Azure Hyperscale secondary replica

I have a ASP .NET Core application using EF and an Azure SQL database. We recently migrated the database to the Hyperscale service tier. The database has 2 vCores and 2 secondary replicas. When we have a function query a secondary replica (by either modifying the connection string to include ApplicationIntent=READONLY; or by using a new services.AddDbContext() from our Startup.cs) we find that functions take 20-30x longer to execute.
For instance, this function:
public async Task<List<StaffWorkMuchModel>> ExemptStaffWorkMuchPerWeek(int quarterId, int facilityId) {
using (var dbConnection = (IDbConnection) _serviceProvider.GetService(typeof(IDbConnection))) {
dbConnection.ConnectionString += "ApplicationIntent=READONLY;";
dbConnection.Open();
return (await dbConnection.QueryAsync<StaffWorkMuchModel>("ExemptStaffWorkMuchPerWeek", new {
id_qtr = quarterId,
id_fac = facilityId
}, commandType: CommandType.StoredProcedure, commandTimeout: 150)).ToList();
}
}
We have tried to query the secondary replica directly using SQL Server Management Studio and have found that the queries all return in less than a second. Also, when we add breakpoints in our code, it seems like the queries are returning results immediately. Most of the pages we are having issues with use ajax to call 4+ functions very similar to the one above. It almost seems like they are not running asynchronously.
This same code runs great when we comment out:
dbConnection.ConnectionString += "ApplicationIntent=READONLY;";
Any idea what could be causing all of our secondary replica functionss to load so slow?

Canceling query with while loop hangs forever

I am trying to use query cancellation (via cancellation tokens) to cancel a long-running complex query. I have found that in some cases not only does cancellation fail to halt the query but also the call to CancellationToken.Cancel() hangs indefinitely. Here is a simple repro that replicates this behavior (can be run in LinqPad):
void Main()
{
var cancellationTokenSource = new CancellationTokenSource();
var blocked = RunSqlAsync(cancellationTokenSource.Token);
blocked.Wait(TimeSpan.FromSeconds(1)).Dump(); // false (blocked in SQL as expected)
cancellationTokenSource.Cancel(); // hangs forever?!
Console.WriteLine("Finished calling Cancel()");
blocked.Wait();
}
public async Task RunSqlAsync(CancellationToken cancellationToken)
{
var connectionString = new SqlConnectionStringBuilder { DataSource = #".\sqlexpress", IntegratedSecurity = true, Pooling = false }.ConnectionString;
using (var connection = new SqlConnection(connectionString))
{
await connection.OpenAsync().ConfigureAwait(false);
using (var command = connection.CreateCommand())
{
command.CommandText = #"
WHILE 1 = 1
BEGIN
DECLARE #x INT = 1
END
";
command.CommandTimeout = 0;
Console.WriteLine("Running query");
await command.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
}
}
}
Interestingly, the same query run in SqlServer Management Studio cancels instantly via the "Cancel Executing Query" button.
Is there some caveat to query cancellation where it cannot cancel tight WHILE loops?
My version of SqlServer:
Microsoft SQL Server 2012 - 11.0.2100.60 (X64)
Feb 10 2012 19:39:15
Copyright (c) Microsoft Corporation
Express Edition (64-bit) on Windows NT 6.2 (Build 9200: )
I am running on Windows 10, and .NET's Environment.Version is 4.0.30319.42000.
EDIT
Some additional information:
Here is the stack trace pulled from Visual Studio when cancellationToken.Cancel() hangs:
Another thread is stuck here:
Additionally, I tried updating to SqlServer Express 2017 and I am seeing the same behavior.
EDIT
I've filed this as a bug with corefx: https://github.com/dotnet/corefx/issues/26623
I can reproduce the issue in a console application. (The code in the question was code from LINQPad.)
I'm going to make this an answer and say that this is a bug in ADO.NET. ADO.NET should send a query cancellation signal to SQL Server. I can see from the CPU usage, that SQL Server continues executing the loop. Therefore, it did not receive cancellation from the client. We also know that SSMS is able to cancel this loop.
While the loop is running I can see that the console app is using 50% of one CPU core and receiving data from SQL Server at 70MB/sec. I do not know what data this is. It might be ROWCOUNT information or something related.
I think the bug is related to the fact that the loop is continuously sending data so that ADO.NET never has an opportunity to send the cancellation. It's still a bug and it would be a community service if you reported it. You can link to this question.
If the loop is throttled using ...
WHILE 1 = 1
BEGIN
DECLARE #x INT = 1
WAITFOR DELAY '00:00:01' --new
END
... then cancellation is quick.
Also, you can generally not rely on cancellation being quick. If the network dropped it might take 30sec for the client to notice this and throw.
Therefore you need to code your program so that it continues executing and not wait for the query to finish. It could look like this:
var queryTask = ...;
var cancellationToken = ...;
await Task.WhenAll(queryTask, cancellationToken);
That way cancellation always looks instantaneous. Make sure that resources are still disposed. All SQL interaction should be encapsulated in queryTask so that it simply continues in the background and eventually cleans up.

LinqToSQL ExecuteReader requires an open and available Connection

I know this question has been asked many times however none of the answers fit my issue.
I have a thread timer firing every 30 seconds that queries a MSSQL db that is under heavy load. If i need to update the data in the console app that i'm using i use Linq To Sql to update the data stored in memory.
My problem is sometimes I get the error ExecuteReader requires an open and available Connection.
The code from the thread timer fires a Thread.Run(reload());
The connection string is
//example code
void reload(...
string connstring = string.Format("Data Source={0},{1};Initial Catalog={2};User ID={3};Password={4};Application Name={5};Connect Timeout=120;MultipleActiveResultSets=True;Max Pool Size=1524;Pooling=true;"
settings = new ConnectionStringSettings("sqlServer", connstring, "System.Data.SqlClient");
using (var tx = new TransactionScope(TransactionScopeOption.Required,
new TransactionOptions() { IsolationLevel = IsolationLevel.ReadUncommitted }))
{
using (SwitchDataDataContext data = new SwitchDataDataContext(settings.ConnectionString))
{
data.CommandTimeout = 560;
then i do many linqtosql searches. The exceptions happen from time to time but not always on the same query's. it's like the connections is opened and is forced closed.
Sometimes the exceptions says the current status is Open, Closed, Connecting. I add a larger ThreadPool to the SQL db but nothing seems to help.
i also have ADO in other parts of the program without any issues.
I believe that your problem is that the transaction scope also has a timeout. The default timeout is 1 minute according to this answer. So the transaction times out long before your command does (560 seconds = 9.3 minutes or so) . You will need to set the timeout property in the instance of the TransactionOptions object you are creating
new TransactionOptions()
{
IsolationLevel = IsolationLevel.ReadUncommitted,
Timeout = new TimeSpan(0,10,0) /* 10 Minutes */
}
You can verify that is indeed the issue by setting the TransactionScope timeout to a small value to force it to timeout.
I changed Linq To Sql to Entity Framework and received the same type of message. I believe the issues is lazy Loading. I was using the collections before it was ready on a different thread. I just added .Include("Lab") to my collection to load the entire collection and it seems to of fixed the issue.

Very slow ExecuteNonQuery (Stored Procedure) vs fast execution using SQL Server Management Studio

Although there are many questions on this topic going around, none of the suggestions seem to work.
I've got a couple of stored procedures which should be run on a daily basis - some of these stored procedures are quite straight forward, others a bit more tricky. But even the simplest of procedures will run indefinitely when called from a C# program (console) using the SqlClient.
This client is running on the server and should be promoted to a windows service when it's actually functioning.
What I've tried so far.
Add ARITHABORT ON (or OFF) as first execute after connection initialization.
Add ARITHABORT ON (or OFF) as first command in the Stored Procedure
Using WITH RECOMPILE
Add ARITHABORT as a global configuration thing.
(EXEC sys.sp_configure N'user options', N'64'
GO
RECONFIGURE WITH OVERRIDE
GO)
The stored procedures (all of them) have no input parameters and the simplest (the only one I currently use) is this:
CREATE PROCEDURE [dbo].[_clean_messageLog]
WITH RECOMPILE
AS
BEGIN
SET NOCOUNT ON;
set arithabort on;
DELETE FROM MessageLog WHERE Moment < GETDATE() - 60;
DELETE FROM MessageLog WHERE Moment < GETDATE() - 30 AND [Status] = 200;
END
There are no messages to be actually deleted and in SSMS the stored procedures executes (as expected) within milliseconds.
From the C# Console Application however it takes forever (literally).
Main-method:
const int TIME_OUT = 900000; // 15 minutes
timer.Stop();
foreach (var command in commands.Where(command => !string.IsNullOrWhiteSpace(command)))
{
var start = DateTime.Now;
WriteEvent(string.Format("Starting: {0}", command), EventLogEntryType.Information);
using (var sql = new Lib.Data.SqlServerHelper(connectionString))
{
sql.newCommand(command.Trim());
sql.execute(TIME_OUT);
}
WriteEvent(string.Format("Done in {0} seconds", DateTime.Now.Subtract(start).TotalSeconds), EventLogEntryType.Information);
}
Does anyone have suggestions?
EDIT
The sqlHelper is just a basic (very simple) wrapper. But even when I change the above code to this:
foreach (var command in commands.Where(command => !string.IsNullOrWhiteSpace(command)))
{
var start = DateTime.Now;
WriteEvent(string.Format("Starting: {0}", command), EventLogEntryType.Information);
using (var sql = new SqlConnection(connectionString))
{
sql.Open();
var sqlCommand = new SqlCommand(command.Trim(), sql) {CommandType = CommandType.StoredProcedure};
sqlCommand.ExecuteNonQuery();
}
WriteEvent(string.Format("Done in {0} seconds", DateTime.Now.Subtract(start).TotalSeconds), EventLogEntryType.Information);
}
It's exactly the same.
EDIT #2
Is there an option I can schedule these stored procedures to be run by SQL Server itself on an interval or specific time?
SOLVED
Kinda, although I've never found an actual C# solution to my problem using the SQL Server Agent did the trick. The C# processes were locked due to deadlock issues - which sometimes also occur on the jobs (not as many as the console program), but we're working on that.
Is there an option I can schedule these stored procedures to be run by
SQL Server itself on an interval or specific time?
Yes, SQL Server Agent can run jobs based on specific time or interval.
Creating SQL Server Job
SSMS -> SQL Server Agent -> Right-Click -> New Job -> Select Name, Database, Code and Schedule
When you finish you can click Script button and get script that create job (if needed).
You can also start Job using T-SQL (for example from application/another stored procedure or trigger):
EXEC msdb.dbo.sp_start_job N'JobName';

Categories

Resources