NServiceBus events lost when published in separate thread

NServiceBus events lost when published in separate thread - c#

I've been working on getting long running messages working with NServiceBus on an Azure transport. Based off this document, I thought I could get away with firing off the long process in a separate thread, marking the event handler task as complete and then listening for custom OperationStarted or OperationComplete events. I noticed the OperationComplete event is not received by my handlers most cases. In fact, the only time it is received is when I publish it immediately after the OperationStarted event is published. Any actual processing in between somehow prevents the completion event from being received. Here is my code:
Abstract class used for long running messages
public abstract class LongRunningOperationHandler<TMessage> : IHandleMessages<TMessage> where TMessage : class
{
protected ILog _logger => LogManager.GetLogger<LongRunningOperationHandler<TMessage>>();
public Task Handle(TMessage message, IMessageHandlerContext context)
{
var opStarted = new OperationStarted
{
OperationID = Guid.NewGuid(),
OperationType = typeof(TMessage).FullName
};
var errors = new List<string>();
// Fire off the long running task in a separate thread
Task.Run(() =>
{
try
{
_logger.Info($"Operation Started: {JsonConvert.SerializeObject(opStarted)}");
context.Publish(opStarted);
ProcessMessage(message, context);
}
catch (Exception ex)
{
errors.Add(ex.Message);
}
finally
{
var opComplete = new OperationComplete
{
OperationType = typeof(TMessage).FullName,
OperationID = opStarted.OperationID,
Errors = errors
};
context.Publish(opComplete);
_logger.Info($"Operation Complete: {JsonConvert.SerializeObject(opComplete)}");
}
});
return Task.CompletedTask;
}
protected abstract void ProcessMessage(TMessage message, IMessageHandlerContext context);
}
Test Implementation
public class TestLongRunningOpHandler : LongRunningOperationHandler<TestCommand>
{
protected override void ProcessMessage(TestCommand message, IMessageHandlerContext context)
{
// If I remove this, or lessen it to something like 200 milliseconds, the
// OperationComplete event gets handled
Thread.Sleep(1000);
}
}
Operation Events
public sealed class OperationComplete : IEvent
{
public Guid OperationID { get; set; }
public string OperationType { get; set; }
public bool Success => !Errors?.Any() ?? true;
public List<string> Errors { get; set; } = new List<string>();
public DateTimeOffset CompletedOn { get; set; } = DateTimeOffset.UtcNow;
}
public sealed class OperationStarted : IEvent
{
public Guid OperationID { get; set; }
public string OperationType { get; set; }
public DateTimeOffset StartedOn { get; set; } = DateTimeOffset.UtcNow;
}
Handlers
public class OperationHandler : IHandleMessages<OperationStarted>
, IHandleMessages<OperationComplete>
{
static ILog logger = LogManager.GetLogger<OperationHandler>();
public Task Handle(OperationStarted message, IMessageHandlerContext context)
{
return PrintJsonMessage(message);
}
public Task Handle(OperationComplete message, IMessageHandlerContext context)
{
// This is not hit if ProcessMessage takes too long
return PrintJsonMessage(message);
}
private Task PrintJsonMessage<T>(T message) where T : class
{
var msgObj = new
{
Message = typeof(T).Name,
Data = message
};
logger.Info(JsonConvert.SerializeObject(msgObj, Formatting.Indented));
return Task.CompletedTask;
}
}
I'm certain that the context.Publish() calls are being hit because the _logger.Info() calls are printing messages to my test console. I've also verified they are hit with breakpoints. In my testing, anything that runs longer than 500 milliseconds prevents the handling of the OperationComplete event.
If anyone can offer suggestions as to why the OperationComplete event is not hitting the handler when any significant amount of time has passed in the ProcessMessage implementation, I'd be extremely grateful to hear them. Thanks!
-- Update --
In case anyone else runs into this and is curious about what I ended up doing:
After an exchange with the developers of NServiceBus, I decided on using a watchdog saga that implemented the IHandleTimeouts interface to periodically check for job completion. I was using saga data, updated when the job was finished, to determine whether to fire off the OperationComplete event in the timeout handler. This presented an other issue: when using In-Memory Persistence, the saga data was not persisted across threads even when it was locked by each thread. To get around this, I created an interface specifically for long running, in-memory data persistence. This interface was injected into the saga as a singleton, and thus used to read/write saga data across threads for long running operations.
I know that In-Memory Persistence is not recommended, but for my needs configuring another type of persistence (like Azure tables) was overkill; I simply want the OperationComplete event to fire under normal circumstances. If a reboot happens during a running job, I don't need to persist the saga data. The job will be cut short anyway and the saga timeout will handle firing the OperationComplete event with an error if the job runs longer than a set maximum time.

The cause of this is that if ProcessMessage is fast enough, you might get the current context before it gets invalidated, such as being disposed.
By returning from Handle successfully, you're telling NServiceBus: "I'm done with this message", so it may do what it wants with the context as well, such as invalidating it. In the background processor, you need an endpoint instance, not a message context.
By the time the new task starts running, you don't know if Handle has returned or not, so you should just consider the message has already been consumed and is thus unrecoverable. If errors happen in your separate task, you can't retry them.
Avoid long running processes without persistence. The sample you mention has a server that stores a work item from a message, and a process that polls this storage for work items. Perhaps not ideal, in case you scale out processors, but it won't lose messages.
To avoid constant polling, merge the server and the processor, poll inconditionally once when it starts, and in Handle schedule a polling task. Take care for this task to only poll if no other polling task is running, otherwise it may become worse than constant polling. You may use a semaphore to control this.
To scale out, you must have more servers. You need to measure if the cost of N processors polling is greater than sending to N servers in a round-robin fashion, for some N, to know which approach actually performs better. In practice, polling is good enough for a low N.
Modifying the sample for multiple processors may require less deployment and configuration effort, you just add or take processors, while adding or removing servers needs changing their enpoints in all places (e.g. config files) that point to them.
Another approach would be to break the long process into steps. NServiceBus has sagas. It's an approach usually implemented for a know or bounded amount of steps. For an unknown amount of steps, it's still feasible, although some might consider it an abuse of the seemingly intended purpose of sagas.

Related

.NET client-side WCF with queued requests

Background
I'm working on updating legacy software library. The legacy code uses an infinitely looping System.Threading.Thread that executes processes in the queue. These processes perform multiple requests with another legacy system that can only process one request at a time.
I'm trying to modernize, but I'm new to WCF services and there may be a big hole in my knowledge that'd simplify things.
WCF Client-Side Host
In modernizing, I'm trying to move to a client-side WCF service. The WCF service allows requests to be queued from multiple a applications. The service takes a request and returns a GUID back so that I can properly associate via the callbacks.
public class SomeService : ISomeService
{
public Guid AddToQueue(Request request)
{
// Code to add the request to a queue, return a Guid, etc.
}
}
public interface ISomeCallback
{
void NotifyExecuting(Guid guid)
void NotifyComplete(Guid guid)
void NotifyFault(Guid guid, byte[] data)
}
WCF Client Process Queues
The problem I'm having is that the legacy processes can include more than one request. Process 1 might do Request X then Request Y, and based on those results follow up with Request Z. With the legacy system, there might be Processes 1-10 queued up.
I have a cludgy model where the process is executed. I'm handling events on the process to know when it's finished or fails. But, it just feels really cludgy...
public class ActionsQueue
{
public IList<Action> PendingActions { get; private set; }
public Action CurrentAction { get; private set; }
public void Add(Action action)
{
PendingAction.Add(action)
if (CurrentAction is null)
ExecuteNextAction();
}
private void ExecuteNextAction()
{
if (PendingActions.Count > 0)
{
CurrentAction = PendingActions[0];
PendingActions.RemoveAt(0);
CurrentAction.Completed += OnActionCompleted;
CurrentAction.Execute();
}
}
private OnActionCompleted(object sender, EventArgs e)
{
CurrentAction = default;
ExecuteNextAction();
}
}
public class Action
{
internal void Execute()
{
// Instantiate the first request
// Add handlers to the first request
// Send it to the service
}
internal void OnRequestXComplete()
{
// Use the data that's come back from the request
// Proceed with future requests
}
}
With the client-side callback the GUID is matched up to the original request, and it raises a related event on the original requests. Again, the implementation here feels really cludgy.
I've seen example of Async methods for the host, having a Task returned, and then using an await on the Task. But, I've also seen recommendations not to do this.
Any recommendations on how to untangle this mess into something more usable are appreciated. Again, it's possible that there's a hole in my knowledge here that's keeping me from a better solutiong.
Thanks

Queued communication between the client and the server of WCF is usually possible using a NetMsmqbinding, which ensures persistent communication between the client and the server. See this article for specific examples.
If you need efficient and fast message processing, use a non-transactional queue and set the ExactlyOnce attribute to False, but this has a security impact. Check this docs for further info.

In case anyone comes along later with a similar issue, this is a rough sketch of what I ended up with:
[ServiceContract(Name="MyService", SessionMode=Session.Required]
public interface IMyServiceContract
{
[OperationContract()]
Task<string> ExecuteRequestAsync(Action action);
}
public class MyService: IMyServiceContract
{
private TaskQueue queue = new TaskQueue();
public async Task<string> ExecuteRequestAsync(Request request)
{
return await queue.Enqueue(() => request.Execute());
}
}
public class TaskQueue
{
private SemaphoreSlim semaphore;
public TaskQueue()
{
semaphore = new SemaphoreSlim(1);
}
Task<T> Enqueue<T>(Func<T> function)
{
await semaphore.WaitAsync();
try
{
return await Task.Factory.StartNew(() => function.invoke();)
}
finally
{
semaphore.Release();
}
}
}

Hangfire Custom State Expiration

I have implemented a custom state "blocked" that moves into the enqueued state after certain external requirements have been fulfilled.
Sometimes these external requirements are never fulfilled which causes the job to be stuck in the blocked state. What I'd like to have is for jobs in this state to automatically expire after some configurable time.
Is there any support for such a requirement? There is the ExpirationDate field, but from looking at the code it seems to be only used for final states.
The state is as simple as can be:
internal sealed class BlockedState : IState
{
internal const string STATE_NAME = "Blocked";
public Dictionary<string, string> SerializeData()
{
return new Dictionary<string, string>();
}
public string Name => STATE_NAME;
public string Reason => "Waiting for external resource";
public bool IsFinal => false;
public bool IgnoreJobLoadException => false;
}
and is used simply as _hangfireBackgroundJobClient.Create(() => Console.WriteLine("hello world"), new BlockedState());
At a later stage it is then moved forward via _hangfireBackgroundJobClient.ChangeState(jobId, new EnqueuedState(), BlockedState.STATE_NAME)

I would go for a custom implementation IBackgroundProcess taking example from DelayedJobScheduler
which picks up delayed jobs on a regular basis to enqueue it.
In this custom implementation I would use a JobStorageConnection.GetAllItemsFromSet("blocked") to get all the blocked job ids (where the DelayedJobScheduler uses JobStorageConnection.GetFirstByLowestScoreFromSet)
Then I would get each blocked job data with JobStorageConnection.GetJobData(jobId). For each of them, depending on its CreatedAt field, I would do nothing if the job is not expired, or change its state to another state (Failed ?) if it is expired.
The custom job process can be declared like this :
app.UseHangfireServer(storage, options,
new IBackgroundProcess[] {
new MyCustomJobProcess(
myTimeSpanForExpiration,
(IBackgroundJobStateChanger) new BackgroundJobStateChanger(filterProvider)) });
A difficulty here is to obtain an IBackgroundJobStateChanger as the server does not seem to expose its own.
If you use a custom FilterProvider as option for your server pass its value as filterProvider, else use (IJobFilterProvider) JobFilterProviders.Providers

Can you take advantage of EventWaitHandle?
Have a look at Generic Timout.
For example:
//action : your job
//timeout : your desired ExpirationDate
void DoSomething(Action action, int timeout)
{
EventWaitHandle waitHandle = new EventWaitHandle(false, EventResetMode.ManualReset);
AsyncCallback callback = ar => waitHandle.Set();
action.BeginInvoke(callback, null);
if (!waitHandle.WaitOne(timeout))
{
// Expired here
}
}

Message inheritance and getting access to the message from consumer

First of all, I'm aware that inheritance is not quite good and we should be careful with it, but...
In a system I have set of different commands that I publish to the bus in order to trigger appropriate services. Now I want to log the fact that some of messages (not all messages, just a subset) were issued.
I was thinking that the ideal way to handle it would be some separate service that just subscribed to the list of messages that I'm interested in, catches it and logs it.
What is the best way to implement it?
I was thinking about using inheritance like this
public interface IAction
{
}
public interface TestCommand : IAction
{
Guid Id { get; }
string Message { get; }
}
public class TestHandler : IConsumer<IAction>, IConsumer<TestCommand>
{
public async Task Consume(ConsumeContext<IAction> context)
{
var action = context.Message;
}
public async Task Consume(ConsumeContext<TestCommand> context)
{
var cmd = context.Message;
}
}
but the problem here that IAction consumer doesn't have access to the whole message, so I just can't get the content to log.
Is it possible to solve this somehow and get access to the message content? Or, this approach is totally wrong and I should use something else?

You can use context.TryGetPayload<YourMessageType>(out var message) but again you need to know the message type.
You can also consume (or get payload of) the JObject, which will give you everything in plain JSON.
Observers is better for logging, check the Audit feature to see how it can be done.

You should move the declaration of your
string Message { get; }
to
public interface IAction
I mean, every command will havec a Message and an Id right ?

How to get access to IJobexecutionContext in Interrupt method in Quartz.net?

I'm currently developing job manager based on Quartz.net. I want to be able to display job execution status in real time so when job interruption request is made I could save context.JobDetail.JobDataMap.Put("status", "interrupting"); in job data map and read this status by fetching all currently executing jobs in scheduler. But, the problem, is I do not have access to IJobExecutionContext context object directly in Interrupt() method, so I cannot set interrupting status immediately at the moment interruption was requested. This is the functionality I basically want to achieve:
class InterruptedException : Exception { }
[PersistJobDataAfterExecution, DisallowConcurrentExecution]
class MyJob : IInterruptableJob
{
private bool _interrupted;
private void AssertContinue()
{
if (_interrupted) throw new InterruptedException();
}
public void Execute(IJobExecutionContext context)
{
try
{
context.JobDetail.JobDataMap.Put("status", "started");
AssertContinue();
// Do the work
AssertContinue();
context.JobDetail.JobDataMap.Put("status", "completed");
}
catch (InterruptedException)
{
// Set interrupted status when job is actually interrupted
context.JobDetail.JobDataMap.Put("status", "interrupted");
}
catch
{
// log any othe errors but set interrupted status only on InterruptedException
}
}
public void Interrupt()
{
_interrupted = true;
// I want to set interrupting statues here!!!
context.JobDetail.JobDataMap.Put("status", "interrupting");
}
}
The basic idea of IInterruptableJob interface implementation, as I understand, is to set some _interrupted flag value to true inside void Interrupt() method, then check this flag value on each execution step in Execute() method. So, basically, we cannot interrupt the job immediately, we can make interruption request, and only when interruption status check is executed we can interrupt the job by throwing an exception, for example. But I want to set interrupting status for my job during this short period of time. How can I do that? Is it even possible to get IJobExecutionContext context object in Interrupt() method?

Well, the solution turns out to be pretty obvious and easy. I just have to create private
IJobExecutionContext _context = null;
field of my job class and set it's value to context in Execute() method.
public void Execute(IJobExecutionContext context)
{
_context = context;
...
}
Basically we can interrupt only the executing job so if Interrupt() is called than Execute() was called at least once so this approach, basically, suits my needs and updates job's DataMap immediately after Interrupt was called.

Asynchronous data loading and subsequent error handling

I have an application that involves a database. Previously, upon opening a window, I would query the database and use this to populate aspects of my view model. This worked reasonably well, but could create noticeable pauses when the data access took longer than expected.
The natural solution, of course, is to run the database query asynchronously and then populate the view model when that query completes. This isn't too hard, but it raises some interesting questions regarding error handling.
Previously, if something went wrong with the database query (a pretty big problem, granted), I would propagate the exception through the view model constructor, ultimately making it back up to the caller that wanted to open the window. It could then display an appropriate error and not actually open the window.
Now, however, the window opens right away, then populates later as the query completes. The question, now, is at what point should I check for an error in the background task? The window is already open, so the behavior needs to be different somehow, but what is a clean way to indicate the failure to the user and allow for graceful recovery/shutdown?
For reference, here is a snippet demonstrating the basic pattern:
public ViewModel()
{
_initTask = InitAsync();
//Now where do I check on the status of the init task?
}
private async Task InitAsync()
{
//Do stuff...
}
//....
public void ShowWindow()
{
var vm = new ViewModel(); //Previously this could throw an exception that would prevent window from being shown
_windowServices.Show(vm);
}
One option I've considered is use an asynchronous factory method for constructing the ViewModel, allowing the entire thing to be constructed and initialized before attempting to display the window. This preserves the old approach of reporting errors before the window is ever opened. However, it gives up some of the UI responsiveness gained by this approach, which allows initial loading of the window to occur in parallel with the query and also allows me (in some cases) to update the UI in increments as each query completes, rather than having the UI compose itself all at once. It avoids locking up the UI thread, but it doesn't reduce the time before the user actually sees the window and can start interacting with it.

Maybe use some kind of messaging/mediator between your viewmodel and underlying service?
Semi-pseudo code using MVVMLight
public ViewModel()
{
Messenger.Default.Register<NotificationMessage<Exception>>(this, message =>
{
// Handle here
});
Task.Factory.StartNew(() => FetchData());
}
public async Task FetchData()
{
// Some magic happens here
try
{
Thread.Sleep(2000);
throw new ArgumentException();
}
catch (Exception e)
{
Messenger.Default.Send(new NotificationMessage<Exception>(this, e, "Aw snap!"));
}
}

I dealt with a similar problem here. I found it'd be best for me to raise an error event from inside the task, like this:
// ViewModel
public class TaskFailedEventArgs: EventArgs
{
public Exception Exception { get; private set; }
public bool Handled { get; set; }
public TaskFailedEventArgs(Exception ex) { this.Exception = ex; }
}
public event EventHandler<TaskFailedEventArgs> TaskFailed = delegate { };
public ViewModel()
{
this.TaskFailed += (s, e) =>
{
// handle it, e.g.: retry, report or set a property
MessageBox.Show(e.Exception.Message);
e.Handled = true;
};
_initTask = InitAsync();
//Now where do I check on the status of the init task?
}
private async Task InitAsync()
{
try
{
// do the async work
}
catch (Exception ex)
{
var args = new TaskFailedEventArgs(ex);
this.TaskFailed(this, args);
if (!args.Handled)
throw;
}
}
// application
public void ShowWindow()
{
var vm = new ViewModel(); //Previously this could throw an exception that would prevent window from being shown
_windowServices.Show(vm);
}
The window still shows up, but it should be displaying some kind of progress notifications (e.g. using IProgress<T> pattern), until the end of the operation (and the error info in case it failed).
Inside the error event handler, you may give the user an option to retry or exit the app gracefully, depending on your business logic.

Stephen Cleary has a series of posts on his blog about Async OOP. In particular, about constructors.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.