Stop HangFire from running a job multiple times on different servers? - c#

I have set up a recurring job that simply processes some orders:
{
RecurringJob.AddOrUpdate(
hangFireJob.JobId,
() => hangFireJob.Execute(),
hangFireJob.Schedule);
}
The issue I'm running into, is that we have multiple "backup" servers all running this same code. They all are reaching out to the same HangFire database.
I'm seeing the same job ran multiple times, which obviously gives us errors since the orders are already processed.
I want the backup servers to recognize that the recurring job has already been queued and not queue it again. I figured that it would go off the jobname to do this (the first parameter above). What am I missing here?
I included the hangfire server setup below:
private IEnumerable<IDisposable> GetHangfireServers()
{
var configSettings = new Settings();
GlobalConfiguration.Configuration
.UseNinjectActivator(_kernel)
.SetDataCompatibilityLevel(CompatibilityLevel.Version_170)
.UseSimpleAssemblyNameTypeSerializer()
.UseRecommendedSerializerSettings()
.UseSqlServerStorage(configSettings.HangfireConnectionString, new SqlServerStorageOptions
{
CommandBatchMaxTimeout = TimeSpan.FromMinutes(5),
SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5),
QueuePollInterval = TimeSpan.Zero,
UseRecommendedIsolationLevel = true,
DisableGlobalLocks = false,
SchemaName = "LuxAdapter",
PrepareSchemaIfNecessary = false
});
yield return new BackgroundJobServer();
}```

Not sure if you have tried this already but look into using the DisableConcurrentExecution attribute on the job that will prevent multiple executions of the same job.

Since the defaults values provide uniqueness only on a process level, you should handle it manually if you want to run different server instances inside the same process:
var options = new BackgroundJobServerOptions
{
ServerName = String.Format(
"{0}.{1}",
Environment.MachineName,
Guid.NewGuid().ToString())
};
var server = new BackgroundJobServer(options);
// or
app.UseHangfireServer(options);

Related

How to send CosmosDB dependency tracking data to Application Insights from Azure Function Event Hub Trigger

I have an Azure Function that is triggered by eventhub and sends data in batches. Inside function, there are multiple calls to insert data into CosmosDB. I have added following code as part of App Insight Monitoring.
builder.Services.AddApplicationInsightsTelemetry();
builder.Services.ConfigureTelemetryModule<DependencyTrackingTelemetryModule>((module, o) =>
{
module.EnableW3CHeadersInjection = true;
});
builder.Services.ConfigureTelemetryModule<EventCounterCollectionModule>(
(module, o) =>
{
module.Counters.Add(new EventCounterCollectionRequest("System.Runtime", "gen-0-size"));
}
);
I can see total response time in App Insight but cannot figure out how to track and send time spent by each insert query in CosmosDB
Here is C# code within Azure Function
var watch = System.Diagnostics.Stopwatch.StartNew();
var DemoContainerData = new
{
id = Guid.NewGuid().ToString(),
UserId = userId,
// other properties
};
_demoContainer.CreateItemAsync<object>(DemoContainerData);
var DemoContainerData2 = new
{
id = Guid.NewGuid().ToString(),
ProductId = productId,
// other properties
};
_productContainer.CreateItemAsync<object>(DemoContainerData2);
/* var dependency = new DependencyTelemetry
{
Name = "",
Target = "",
Data = ",
Timestamp = start,
Duration = DateTime.UtcNow - start,
Success = true
};
this._telemetryClient.TrackDependency(dependency);
*/
watch.Stop();
var elapsed = watch.Elapsed.TotalMilliseconds;
log.LogInformation("Total Items {0} - Total Time {1}", Items.Length, elapsed);
Your code is not awaiting the async operations, you should be:
ItemResponse<object> response = await _demoContainer.CreateItemAsync<object>(DemoContainerData);
From the response, you can measure the client latency:
var elapsedTimeForOperation = response.Diagnostics.GetClientElapsedTime();
What we recommend if you want to investigate high latency is to log the Diagnostics when the request goes above some threshold, for example:
if (response.Diagnostics.GetClientElapsedTime() > ConfigurableSlowRequestTimeSpan)
{
// Log the diagnostics and add any additional info necessary to correlate to other logs
log.LogWarning("Slow request {0}", response.Diagnostics);
}
For best latency, make sure you are following https://learn.microsoft.com/azure/cosmos-db/sql/troubleshoot-dot-net-sdk-slow-request?tabs=cpu-new#application-design (mainly make sure you are using a Singleton client, use ApplicationRegion or ApplicationPreferredRegions to define preferred region to connect which hopefully is the same region the Function is deployed to).

Hangfire in single application on multiple physical servers

I am running hangfire in a single web application, my application is being run on 2 physical servers but hangfire is in 1 database.
At the moment, i am generating a server for each queue, because each queue i need to run 1 worker at a time and they must be in order. I set them up like this
// core
services.AddHangfire(options =>
{
options.SetDataCompatibilityLevel(CompatibilityLevel.Version_170);
options.UseSimpleAssemblyNameTypeSerializer();
options.UseRecommendedSerializerSettings();
options.UseSqlServerStorage(appSettings.Data.DefaultConnection.ConnectionString, storageOptions);
});
// add multiple servers, this way we get to control how many workers are in each queue
services.AddHangfireServer(options =>
{
options.ServerName = "workflow-queue";
options.WorkerCount = 1;
options.Queues = new string[] { "workflow-queue" };
options.SchedulePollingInterval = TimeSpan.FromSeconds(10);
});
services.AddHangfireServer(options =>
{
options.ServerName = "alert-schedule";
options.WorkerCount = 1;
options.Queues = new string[] { "alert-schedule" };
options.SchedulePollingInterval = TimeSpan.FromMinutes(1);
});
services.AddHangfireServer(options =>
{
options.ServerName = string.Format("trigger-schedule");
options.WorkerCount = 1;
options.Queues = new string[] { "trigger-schedule" };
options.SchedulePollingInterval = TimeSpan.FromMinutes(1);
});
services.AddHangfireServer(options =>
{
options.ServerName = "report-schedule";
options.WorkerCount = 1;
options.Queues = new string[] { "report-schedule" };
options.SchedulePollingInterval = TimeSpan.FromMinutes(1);
});
services.AddHangfireServer(options =>
{
options.ServerName = "maintenance";
options.WorkerCount = 5;
options.Queues = new string[] { "maintenance" };
options.SchedulePollingInterval = TimeSpan.FromMinutes(10);
});
My problem is that it is generating multiple queues on the servers, with different ports.
In my code i am then trying to stop jobs from running if they are queued/retrying, but if the job is being run on a different physical server, it is not found and queued again.
Here is the code to check if its running already
public async Task<bool> IsAlreadyQueuedAsync(PerformContext context)
{
var disableJob = false;
var monitoringApi = JobStorage.Current.GetMonitoringApi();
// get the jobId, method and queue using performContext
var jobId = context.BackgroundJob.Id;
var methodInfo = context.BackgroundJob.Job.Method;
var queueAttribute = (QueueAttribute)Attribute.GetCustomAttribute(context.BackgroundJob.Job.Method, typeof(QueueAttribute));
// enqueuedJobs
var enqueuedjobStatesToCheck = new[] { "Processing" };
var enqueuedJobs = monitoringApi.EnqueuedJobs(queueAttribute.Queue, 0, 1000);
var enqueuedJobsAlready = enqueuedJobs.Count(e => e.Key != jobId && e.Value != null && e.Value.Job != null && e.Value.Job.Method.Equals(methodInfo) && enqueuedjobStatesToCheck.Contains(e.Value.State));
if (enqueuedJobsAlready > 0)
disableJob = true;
// scheduledJobs
if (!disableJob)
{
// check if there are any scheduledJobs that are processing
var scheduledJobs = monitoringApi.ScheduledJobs(0, 1000);
var scheduledJobsAlready = scheduledJobs.Count(e => e.Key != jobId && e.Value != null && e.Value.Job != null && e.Value.Job.Method.Equals(methodInfo));
if (scheduledJobsAlready > 0)
disableJob = true;
}
// failedJobs
if (!disableJob)
{
var failedJobs = monitoringApi.FailedJobs(0, 1000);
var failedJobsAlready = failedJobs.Count(e => e.Key != jobId && e.Value != null && e.Value.Job != null && e.Value.Job.Method.Equals(methodInfo));
if (failedJobsAlready > 0)
disableJob = true;
}
// if runBefore is true, then lets remove the current job running, else it will write a "successful" message in the logs
if (disableJob)
{
// use hangfire delete, for cleanup
BackgroundJob.Delete(jobId);
// create our sqlBuilder to remove the entries altogether including the count
var sqlBuilder = new SqlBuilder()
.DELETE_FROM("Hangfire.[Job]")
.WHERE("[Id] = {0};", jobId);
sqlBuilder.Append("DELETE TOP(1) FROM Hangfire.[Counter] WHERE [Key] = 'stats:deleted' AND [Value] = 1;");
using (var cmd = _context.CreateCommand(sqlBuilder))
await cmd.ExecuteNonQueryAsync();
return true;
}
return false;
}
Each method has something like the following attributes as well
public interface IAlertScheduleService
{
[Hangfire.Queue("alert-schedule")]
[Hangfire.DisableConcurrentExecution(60 * 60 * 5)]
Task RunAllAsync(PerformContext context);
}
Simple implementation of the interface
public class AlertScheduleService : IAlertScheduleService
{
public Task RunAllAsync(PerformContext context)
{
if (IsAlreadyQueuedAsync(context))
return;
// guess it isnt queued, so run it here....
}
}
Here is how i am adding my scheduled jobs
//// our recurring jobs
//// set these to run hourly, so they can play "catch-up" if needed
RecurringJob.AddOrUpdate<IAlertScheduleService>(e => e.RunAllAsync(null), Cron.Hourly(0), queue: "alert-schedule");
Why does this happen? How can i stop it happening?
Somewhat of a blind shot, preventing a job to be queued if a job is already queued in the same queue.
The try-catch logic is quite ugly but I have no better idea right now...
Also, really not sure the lock logic always prevents from having two jobs in EnqueudState, but it should help anyway. Maybe mixing with an IApplyStateFilter.
public class DoNotQueueIfAlreadyQueued : IElectStateFilter
{
public void OnStateElection(ElectStateContext context)
{
if (context.CandidateState is EnqueuedState)
{
EnqueuedState es = context.CandidateState as EnqueuedState;
IDisposable distributedLock = null;
try
{
while (distributedLock == null)
{
try
{
distributedLock = context.Connection.AcquireDistributedLock($"{nameof(DoNotQueueIfAlreadyQueued)}-{es.Queue}", TimeSpan.FromSeconds(1));
}
catch { }
}
var m = context.Storage.GetMonitoringApi();
if (m.EnqueuedCount(es.Queue) > 0)
{
context.CandidateState = new DeletedState();
}
}
finally
{
distributedLock.Dispose();
}
}
}
}
The filter can be declared as in this answer
There seems to be a bug with your currently used hangfire storage implementation:
https://github.com/HangfireIO/Hangfire/issues/1025
The current options are:
Switching to HangFire.LiteDB as commented here: https://github.com/HangfireIO/Hangfire/issues/1025#issuecomment-686433594
Implementing your own logic to enqueue a job, but this would take more effort.
Making your job execution idempotent to avoid side effects in case it's executed multiple times.
In either option, you should still apply DisableConcurrentExecution and make your job execution idempotent as explained below, so i think you can just go with below option:
Applying DisableConcurrentExecution is necessary, but it's not enough as there are no reliable automatic failure detectors in distributed systems. That's the nature of distributed systems, we usually have to rely on timeouts to detect failures, but it's not reliable.
Hangfire is designed to run with at-least-once execution semantics. Explained below:
One of your servers may be executing the job, but it's detected as being failed due to various reasons. For example: your current processing server does not send heartbeats in time due to a temporary network issue or due to temporary high load.
When the current processing server is assumed to be failed (but it's not), the job will be scheduled to another server which causes it to be executed more than once.
The solution should be still applying DisableConcurrentExecution attribute as a best effort to prevent multiple executions of the same job, but the main thing is that you need to make the execution of the job idempotent which does not cause side effects in case it's executed multiple times.
Please refer to some quotes from https://docs.hangfire.io/en/latest/background-processing/throttling.html:
Throttlers apply only to different background jobs, and there’s no
reliable way to prevent multiple executions of the same background job
other than by using transactions in background job method itself.
DisableConcurrentExecution may help a bit by narrowing the safety
violation surface, but it heavily relies on an active connection,
which may be broken (and lock is released) without any notification
for our background job.
As there are no reliable automatic failure detectors in distributed
systems, it is possible that the same job is being processed on
different workers in some corner cases. Unlike OS-based mutexes,
mutexes in this package don’t protect from this behavior so develop
accordingly.
DisableConcurrentExecution filter may reduce the probability of
violation of this safety property, but the only way to guarantee it is
to use transactions or CAS-based operations in our background jobs to
make them idempotent.
You can also refer to this as Hangfire timeouts behavior seems to be dependent on storage as well: https://github.com/HangfireIO/Hangfire/issues/1960#issuecomment-962884011

Quarts.net scheduler fails to add jobs SQL table when JobDataMap is added to job during setup

I am running into an issue with scheduling jobs that contain JobMapData. I am using SQL Server to persist job/trigger data.
When I schedule my jobs without adding the JobMapData, they show up in the database as expected, and my application executes the jobs perfectly. When I add the needed JobMapData for the job however, No triggers/Job data gets inserted into the tables.
[My Quartz Config]
var schedulerConfig = new NameValueCollection
{
{ "quartz.scheduler.instanceName", "TaskScheduler" },
{ "quartz.scheduler.instanceId", "TaskScheduler" },
{ "quartz.threadPool.type", "Quartz.Simpl.SimpleThreadPool, Quartz" },
{ "quartz.threadPool.threadCount", "3"},
{ "quartz.threadPool.threadPriority", "Normal" },
{ "quartz.jobStore.misfireThreshold", "60000"},
{ "quartz.jobStore.type", "Quartz.Impl.AdoJobStore.JobStoreTX, Quartz" },
{ "quartz.jobStore.useProperties", "true" },
{ "quartz.jobStore.dataSource", "default" },
{ "quartz.jobStore.tablePrefix", "_QRTZ_" },
{ "quartz.jobStore.driverDelegateType", "Quartz.Impl.AdoJobStore.SqlServerDelegate, Quartz"},
{ "quartz.dataSource.default.connectionString", "Server=XXXXXXXXXXX; Database=TaskScheduling;Trusted_Connection=True;MultipleActiveResultSets=true" },
{ "quartz.dataSource.default.provider", "SqlServer" },
{ "quartz.serializer.type", "binary"},
};
[Job Setup]
IJobDetail job = JobBuilder.Create<SendCyrstalReportJob>()
.WithIdentity("SendWorkCenterLoadSummary", "TaskSchedulerService")
.WithDescription("Sends the WorkCenterLoadSummary report to a list of email recipients")
.UsingJobData("parameters", "MailCrystalReportAsExcel -reportName \\\\fs5\\Reports\\LoadSummary6WeekForecast.rpt -recipients mailrecipient#addy.com")
.Build();
ITrigger trigger = TriggerBuilder.Create()
.WithIdentity("WorkCenterLoadSummaryTrigger", "TaskSchedulerService")
.WithCronSchedule("0/10 * * * * ?")
.ForJob("SendWorkCenterLoadSummary", "TaskSchedulerService")
.UsingJobData("command", "c:\\CommandConsole\\ecc.exe")
.Build();
_scheduler.ScheduleJob(job, trigger);
_scheduler.ListenerManager.AddJobListener(_jobListener);
The code as provided will result in no job/trigger data being added to the database. If I comment out the UsingJobData from both the job definition and the trigger definition will result in the job being scheduled and subsequently executed as expected.
There are no errors thrown, nothing reported to the log files or console window indicating a problem, just no data going to the database.
Any ideas what might be going on? :-)
I had the same problem. After a lot of try/error, I've found this solution:
In your scheduler config change this setting as follows:
{ "quartz.jobStore.useProperties", "false" }
Put this flag in "false" or just remove it from your config. If you set it to "true", Quartz assumes all your job datamap values will be strings (according to documentation). In my case I was sending integers and strings in the values and this change fixed it. It seems, you're sending strings in the values, but you can give a try changing this setting.
Source: Job Stores Documentation
Change serializer to json
from { "quartz.serializer.type", "binary"},
to
from { "quartz.serializer.type", "json"},
and install json serializer for quartz

Mass Transit - How to Schedule a message using Azure Service Bus

I've checked into the documentation regarding Scheduling with Azure Service Bus, but I am not clear on how exactly to send a message from a "disconnected" Bus.
Here's how I've configured my Service that is processing messages on the server:
builder.AddMassTransit(mt =>
{
mt.AddConsumers(cqrsAssembly);
mt.AddBus(context => Bus.Factory.CreateUsingAzureServiceBus(x =>
{
x.RequiresSession = true;
x.MaxConcurrentCalls = 500;
x.MessageWaitTimeout = TimeSpan.FromMinutes(5);
x.UseRenewLock(TimeSpan.FromMinutes(4));
x.UseServiceBusMessageScheduler();
var host = x.Host(serviceUri, h =>
{
h.SharedAccessSignature(s =>
{
s.KeyName = "key-name";
s.SharedAccessKey = "access-key";
s.TokenTimeToLive = TimeSpan.FromDays(1);
s.TokenScope = TokenScope.Namespace;
});
h.OperationTimeout = TimeSpan.FromMinutes(2);
});
x.ReceiveEndpoint(host, $"mt.myqueue", ep =>
{
ep.RequiresSession = true;
ep.MaxConcurrentCalls = 500;
ep.RemoveSubscriptions = true;
ep.UseMessageRetry(r =>
{
r.Interval(4, TimeSpan.FromSeconds(30));
r.Handle<TransientCommandException>();
});
ep.ConfigureConsumers(context);
});
});
});
I've explicitly called UseServiceBusMessageScheduler().
In the project that is creating and sending messages to the queue (runs in a different context, so is done to "send only"), we have this:
var bus = Bus.Factory.CreateUsingAzureServiceBus(x =>
{
x.RequiresSession = true;
x.MessageWaitTimeout = TimeSpan.FromMinutes(5);
x.UseRenewLock(TimeSpan.FromMinutes(4));
x.Send<ICommand>(s => s.UseSessionIdFormatter(ctx => ctx.Message.SessionId ?? Guid.NewGuid().ToString()));
var host = x.Host(serviceUri, h =>
{
h.SharedAccessSignature(s =>
{
s.KeyName = "key-name";
s.SharedAccessKey = "key";
s.TokenTimeToLive = TimeSpan.FromDays(1);
s.TokenScope = TokenScope.Namespace;
});
h.OperationTimeout = TimeSpan.FromMinutes(2);
});
EndpointConvention.Map<ICommand>(new Uri($"{serviceUri.ToString()}mt.myqueue"));
EndpointConvention.Map<Command>(new Uri($"{serviceUri.ToString()}mt.myqueue"));
});
Now, to send a scheduled message, we do this:
var dest = "what?";
await bus.ScheduleSend(dest, scheduledEnqueueTimeUtc.Value, message);
I am unsure of what needs to be passed into the destinationAddress.
I've tried:
- serviceUri
- `{serviceUri}mt.myqueue"
But checking the queues, I don't see my message in either the base queue, the skipped queue or the error queue.
Am I missing some other configuration, and if not, how does one determine the destination queue?
Am using version 5.5.4 of Mass Transit, and every overload to ScheduleSend() requires it.
First of all, yes your Uri format is correct. In the end after formatting you need something like this :
new Uri(#"sb://yourdomain.servicebus.windows.net/yourapp/your_message_queue")
Also make sure you added when you configured your endPoint. (See the link below)
configurator.UseServiceBusMessageScheduler();
If you follow the Mass-Transit documentation, scheduling is done from a ConsumeContext. See Mass-Transit Azure Scheduling
public class ScheduleNotificationConsumer :
IConsumer<AssignSeat>
{
Uri _schedulerAddress;
Uri _notificationService;
public async Task Consume(ConsumeContext<AssignSeat> context)
{
if(context.Message.ReservationTime - DateTime.Now < TimeSpan.FromHours(8))
{
// assign the seat for the reservation
}
else
{
// seats can only be assigned eight hours before the reservation
context.ScheduleMessage(context.Message.ReservationTime - TimeSpan.FromHours(8), context.Message);
}
}
}
However in a use case we faced this week we needed to schedule from outside a consumeContext, or simply didn't want to forward the context down to where we scheduled. When using IBusControl.ScheduleSend we get no error feedback, but we also don't really get any scheduling done.
After looking at what Mass-Transit does it turns out that from an IBusControl, it creates a new scheduling provider. Whereas from the Context it uses the ServiceBusScheduleMessageProvider.
So what we're doing now until we clean up this bit, is calling the ServiceBusScheduleMessageProvider outright.
await new ServiceBusScheduleMessageProvider(_busControl).ScheduleSend(destinationUri
, scheduleDateTime.UtcDateTime
, Task.FromResult<T>(message)
, Pipe.Empty<SendContext>()
, default);
Hope it makes sense and helps a bit.

How to get List of all Hangfire Jobs using JobStorage in C#?

I am using Hangfire BackgroundJob to create a background job in C# using below code.
var options = new BackgroundJobServerOptions
{
ServerName = "Test Server",
SchedulePollingInterval = TimeSpan.FromSeconds(30),
Queues = new[] { "critical", "default", "low" },
Activator = new AutofacJobActivator(container),
};
var jobStorage = new MongoStorage("mongodb://localhost:*****", "TestDB", new MongoStorageOptions()
{
QueuePollInterval = TimeSpan.FromSeconds(30)
});
var _Server = new BackgroundJobServer(options, jobStorage);
It creates Jobserver object and after that, I am creating Schedule, Recurring Jobs as below.
var InitJob = BackgroundJob.Schedule<TestInitializationJob>(job => job.Execute(), TimeSpan.FromSeconds(5));
var secondJob = BackgroundJob.ContinueWith<Test_SecondJob>(InitJob, job => job.Execute());
BackgroundJob.ContinueWith<Third_Job>(secondJob, job => job.Execute());
RecurringJob.AddOrUpdate<RecurringJobInit>("test-recurring-job", job => job.Execute(), Cron.MinuteInterval(1));
After that, I want to delete or stop all Jobs when my application is stop or close. So in OnStop event of my application, I have written below code.
var monitoringApi = JobStorage.Current.GetMonitoringApi();
var queues = monitoringApi.Queues();// BUT this is not returning all queues and all jobs
foreach (QueueWithTopEnqueuedJobsDto queue in queues)
{
var jobList = monitoringApi.EnqueuedJobs(queue.Name, 0, 100);
foreach (var item in jobList)
{
BackgroundJob.Delete(item.Key);
}
}
But, the above code to get all the Jobs and all Queues is not working. It always returning "default" queue and not returning all jobs.
Can anyone have an idea to get all the Jobs using Hangfire JobStorage and Stops those job when Application is stopped?
Any Help would be highly appreciated!
Thanks
Single Server Setup
To get all recurring jobs you can use the job storage (e.g. either via static instance or DI):
using (var connection = JobStorage.Current.GetConnection())
{
var recurringJobs = connection.GetRecurringJobs();
foreach (var recurringJob in recurringJobs)
{
if (NonRemovableJobs.ContainsKey(recurringJob.Id)) continue;
logger.LogWarning($"Removing job with id [{recurringJob.Id}]");
jobManager.RemoveIfExists(recurringJob.Id);
}
}
If your application acts as single Hangfire server, all job processing will be stopped, as soon as the application is stopped. In this case they wouldn't even need to be removed.
Multi Server Setup
In a multi instance setup which uses the same Hangfire tables for multiple servers, you'll run into the problem that not all applications have all assemblies available. With the method above Hangfire tries to deserialize every job it finds, which results in "Assembly Not Found" exceptions.
To prevent this I used the following workaround, which loads the column 'Key' from the table 'Hash'. It comes in the format 'recurring-jobs:{YourJobIdentifier}'. Then the job id is used to remove the job if neccessary:
var queue = 'MyInstanceQueue'; // probably using queues in a multi server setup
var recurringJobsRaw = await dbContext.HangfireHashes.FromSqlInterpolated($"SELECT [Key] FROM [Hangfire].[Hash] where Field='Queue' AND Value='{queue}'").ToListAsync();
var recJobIds = recurringJobsRaw.Select(s => s.Key.Split(":").Last());
foreach (var id in recJobIds)
{
if (NonRemovableJobs.ContainsKey(id)) continue;
logger.LogWarning($"Removing job with id [{id}]");
jobManager.RemoveIfExists(id);
}
P.S.: To make it work with EF Core I used a Keyless entity for the Hangfire.Hash table.

Categories

Resources