Handle rabbitmq messages concurrenrtly - c#

I asked a question here about why starting a process using Thread.Run did not execute as many concurrent requests as I expected.
The reason behind this question was that I was trying to create a class which can pull messages off a rabbitmq queue and process them concurrently up to a maximum number of concurrent messages.
To do this I ended up with the following in the Received handler of the EventingBasicConsumer class.
async void Handle(EventArgs e)
{
await _semaphore.WaitAsync();
var thread = new Thread(() =>
{
Process(e);
_semaphore.Release();
_channel.BasicAck(....);
});
thread.Start();
}
However the comments on the previous post were not to start a thread unless doing CPU bound work.
The above handler does not know whether the work will be CPU bound, Network, Disk or otherwise. (Process is an abstract method).
Even so I think I have to start a thread or task here, otherwise the Process method blocks the rabbitmq thread and the event handler is not called again until it is finished. So I can only handle one method at once.
Is starting a new Thread here okay? Originally I had used Task.Run but this didn't produce as many workers as wanted. See other post.
FYI. The number of concurrent threads is capped by setting the InitialCount on the semaphore.

As already been said in linked question, big number of threads doesn't guarantee the performance, as if their number gets more than the number of logical cores, you got a thread starvation situation with no real work being done.
However, if you still need to handle the number of a concurrent operations, you may give a try to the TPL Dataflow library, with settings up the MaxDegreeOfParallelism, like in this tutorial.
var workerBlock = new ActionBlock<EventArgs>(
// Process event
e => Process(e),
// Specify a maximum degree of parallelism.
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = InitialCount
});
var bufferBlock = new BufferBlock();
// link the blocks for automatically propagading the messages
bufferBlock.LinkTo(workerBlock);
// asynchronously send the message
await bufferBlock.SendAsync(...);
// synchronously send the message
bufferBlock.Post(...);
BufferBlock is a queue, so the order of messages will be preserved. Also, you can add the different handlers (with a different degree of parallelism) with linking the blocks with filter lambda:
bufferBlock.LinkTo(cpuWorkerBlock, e => e is CpuEventArgs);
bufferBlock.LinkTo(networkWorkerBlock, e => e is NetworkEventArgs);
bufferBlock.LinkTo(diskWorkerBlock, e => e is DiskEventArgs);
but in this case you should setup a default handler at the end of the chain, so the message wouldn't disappear (you may use a NullTarget block for this):
bufferBlock.LinkTo(DataflowBlock.NullTarget<EventArgs>);
Also, the block could be an observers, so they perfectly work with Reactive Extensions on UI side.

Related

Is there a way to limit the number of parallel Tasks globally in an ASP.NET Web API application?

I have an ASP.NET 5 Web API application which contains a method that takes objects from a List<T> and makes HTTP requests to a server, 5 at a time, until all requests have completed. This is accomplished using a SemaphoreSlim, a List<Task>(), and awaiting on Task.WhenAll(), similar to the example snippet below:
public async Task<ResponseObj[]> DoStuff(List<Input> inputData)
{
const int maxDegreeOfParallelism = 5;
var tasks = new List<Task<ResponseObj>>();
using var throttler = new SemaphoreSlim(maxDegreeOfParallelism);
foreach (var input in inputData)
{
tasks.Add(ExecHttpRequestAsync(input, throttler));
}
List<ResponseObj> resposnes = await Task.WhenAll(tasks).ConfigureAwait(false);
return responses;
}
private async Task<ResponseObj> ExecHttpRequestAsync(Input input, SemaphoreSlim throttler)
{
await throttler.WaitAsync().ConfigureAwait(false);
try
{
using var request = new HttpRequestMessage(HttpMethod.Post, "https://foo.bar/api");
request.Content = new StringContent(JsonConvert.SerializeObject(input, Encoding.UTF8, "application/json");
var response = await HttpClientWrapper.SendAsync(request).ConfigureAwait(false);
var responseBody = await response.Content.ReadAsStringAsync().ConfigureAwait(false);
var responseObject = JsonConvert.DeserializeObject<ResponseObj>(responseBody);
return responseObject;
}
finally
{
throttler.Release();
}
}
This works well, however I am looking to limit the total number of Tasks that are being executed in parallel globally throughout the application, so as to allow scaling up of this application. For example, if 50 requests to my API came in at the same time, this would start at most 250 tasks running parallel. If I wanted to limit the total number of Tasks that are being executed at any given time to say 100, is it possible to accomplish this? Perhaps via a Queue<T>? Would the framework automatically prevent too many tasks from being executed? Or am I approaching this problem in the wrong way, and would I instead need to Queue the incoming requests to my application?
I'm going to assume the code is fixed, i.e., Task.Run is removed and the WaitAsync / Release are adjusted to throttle the HTTP calls instead of List<T>.Add.
I am looking to limit the total number of Tasks that are being executed in parallel globally throughout the application, so as to allow scaling up of this application.
This does not make sense to me. Limiting your tasks limits your scaling up.
For example, if 50 requests to my API came in at the same time, this would start at most 250 tasks running parallel.
Concurrently, sure, but not in parallel. It's important to note that these aren't 250 threads, and that they're not 250 CPU-bound operations waiting for free thread pool threads to run on, either. These are Promise Tasks, not Delegate Tasks, so they don't "run" on a thread at all. It's just 250 objects in memory.
If I wanted to limit the total number of Tasks that are being executed at any given time to say 100, is it possible to accomplish this?
Since (these kinds of) tasks are just in-memory objects, there should be no need to limit them, any more than you would need to limit the number of strings or List<T>s. Apply throttling where you do need it; e.g., number of HTTP calls done simultaneously per request. Or per host.
Would the framework automatically prevent too many tasks from being executed?
The framework has nothing like this built-in.
Perhaps via a Queue? Or am I approaching this problem in the wrong way, and would I instead need to Queue the incoming requests to my application?
There's already a queue of requests. It's handled by IIS (or whatever your host is). If your server gets too busy (or gets busy very suddenly), the requests will queue up without you having to do anything.
If I wanted to limit the total number of Tasks that are being executed at any given time to say 100, is it possible to accomplish this?
What you are looking for is to limit the MaximumConcurrencyLevel of what's called the Task Scheduler. You can create your own task scheduler that regulates the MaximumCongruencyLevel of the tasks it manages. I would recommend implementing a queue-like object that tracks incoming requests and currently working requests and waits for the current requests to finish before consuming more. The below information may still be relevant.
The task scheduler is in charge of how Tasks are prioritized, and in charge of tracking the tasks and ensuring that their work is completed, at least eventually.
The way it does this is actually very similar to what you mentioned, in general the way the Task Scheduler handles tasks is in a FIFO (First in first out) model very similar to how a ConcurrentQueue<T> works (at least starting in .NET 4).
Would the framework automatically prevent too many tasks from being executed?
By default the TaskScheduler that is created with most applications appears to default to a MaximumConcurrencyLevel of int.MaxValue. So theoretically yes.
The fact that there practically is no limit to the amount of tasks(at least with the default TaskScheduler) might not be that big of a deal for your case scenario.
Tasks are separated into two types, at least when it comes to how they are assigned to the available thread pools. They're separated into Local and Global queues.
Without going too far into detail, the way it works is if a task creates other tasks, those new tasks are part of the parent tasks queue (a local queue). Tasks spawned by a parent task are limited to the parent's thread pool.(Unless the task scheduler takes it upon itself to move queues around)
If a task isn't created by another task, it's a top-level task and is placed into the Global Queue. These would normally be assigned their own thread(if available) and if one isn't available it's treated in a FIFO model, as mentioned above, until it's work can be completed.
This is important because although you can limit the amount of concurrency that happens with the TaskScheduler, it may not necessarily be important - if for say you have a top-level task that's marked as long running and is in-charge of processing your incoming requests. This would be helpful since all the tasks spawned by this top-level task will be part of that task's local queue and therefor won't spam all your available threads in your thread pool.
When you have a bunch of items and you want to process them asynchronously and with limited concurrency, the SemaphoreSlim is a great tool for this job. There are two ways that it can be used. One way is to create all the tasks immediately and have each task acquire the semaphore before doing it's main work, and the other is to throttle the creation of the tasks while the source is enumerated. The first technique is eager, and so it consumes more RAM, but it's more maintainable because it is easier to understand and implement. The second technique is lazy, and it's more efficient if you have millions of items to process.
The technique that you have used in your sample code is the second (lazy) one.
Here is an example of using two SemaphoreSlims in order to impose two maximum concurrency policies, one per request and one globally. First the eager approach:
private const int maxConcurrencyGlobal = 100;
private static SemaphoreSlim globalThrottler
= new SemaphoreSlim(maxConcurrencyGlobal, maxConcurrencyGlobal);
public async Task<ResponseObj[]> DoStuffAsync(IEnumerable<Input> inputData)
{
const int maxConcurrencyPerRequest = 5;
var perRequestThrottler
= new SemaphoreSlim(maxConcurrencyPerRequest, maxConcurrencyPerRequest);
Task<ResponseObj>[] tasks = inputData.Select(async input =>
{
await perRequestThrottler.WaitAsync();
try
{
await globalThrottler.WaitAsync();
try
{
return await ExecHttpRequestAsync(input);
}
finally { globalThrottler.Release(); }
}
finally { perRequestThrottler.Release(); }
}).ToArray();
return await Task.WhenAll(tasks);
}
The Select LINQ operator provides an easy and intuitive way to project items to tasks.
And here is the lazy approach for doing exactly the same thing:
private const int maxConcurrencyGlobal = 100;
private static SemaphoreSlim globalThrottler
= new SemaphoreSlim(maxConcurrencyGlobal, maxConcurrencyGlobal);
public async Task<ResponseObj[]> DoStuffAsync(IEnumerable<Input> inputData)
{
const int maxConcurrencyPerRequest = 5;
var perRequestThrottler
= new SemaphoreSlim(maxConcurrencyPerRequest, maxConcurrencyPerRequest);
var tasks = new List<Task<ResponseObj>>();
foreach (var input in inputData)
{
await perRequestThrottler.WaitAsync();
await globalThrottler.WaitAsync();
Task<ResponseObj> task = Run(async () =>
{
try
{
return await ExecHttpRequestAsync(input);
}
finally
{
try { globalThrottler.Release(); }
finally { perRequestThrottler.Release(); }
}
});
tasks.Add(task);
}
return await Task.WhenAll(tasks);
static async Task<T> Run<T>(Func<Task<T>> action) => await action();
}
This implementation assumes that the await globalThrottler.WaitAsync() will never throw, which is a given according to the documentation. This will no longer be the case if you decide later to add support for cancellation, and you pass a CancellationToken to the method. In that case you would need one more try/finally wrapper around the task-creation logic. The first (eager) approach could be enhanced with cancellation support without such considerations. Its existing try/finally infrastructure is
already sufficient.
It is also important that the internal helper Run method is implemented with async/await. Eliding the async/await would be an easy mistake to make, because in that case any exception thrown synchronously by the ExecHttpRequestAsync method would be rethrown immediately, and it would not be encapsulated in a Task<ResponseObj>. Then the task returned by the DoStuffAsync method would fail without releasing the acquired semaphores, and also without awaiting the completion of the already started operations. That's another argument for preferring the eager approach. The lazy approach has too many gotchas to watch for.

Receive concurrent asynchronous requests and process them one at a time

Background
We have a service operation that can receive concurrent asynchronous requests and must process those requests one at a time.
In the following example, the UploadAndImport(...) method receives concurrent requests on multiple threads, but its calls to the ImportFile(...) method must happen one at a time.
Layperson Description
Imagine a warehouse with many workers (multiple threads). People (clients) can send the warehouse many packages (requests) at the same time (concurrently). When a package comes in a worker takes responsibility for it from start to finish, and the person who dropped off the package can leave (fire-and-forget). The workers' job is to put each package down a small chute, and only one worker can put a package down a chute at a time, otherwise chaos ensues. If the person who dropped off the package checks in later (polling endpoint), the warehouse should be able to report on whether the package went down the chute or not.
Question
The question then is how to write a service operation that...
can receive concurrent client requests,
receives and processes those requests on multiple threads,
processes requests on the same thread that received the request,
processes requests one at a time,
is a one way fire-and-forget operation, and
has a separate polling endpoint that reports on request completion.
We've tried the following and are wondering two things:
Are there any race conditions that we have not considered?
Is there a more canonical way to code this scenario in C#.NET with a service oriented architecture (we happen to be using WCF)?
Example: What We Have Tried?
This is the service code that we have tried. It works though it feels like somewhat of a hack or kludge.
static ImportFileInfo _inProgressRequest = null;
static readonly ConcurrentDictionary<Guid, ImportFileInfo> WaitingRequests =
new ConcurrentDictionary<Guid, ImportFileInfo>();
public void UploadAndImport(ImportFileInfo request)
{
// Receive the incoming request
WaitingRequests.TryAdd(request.OperationId, request);
while (null != Interlocked.CompareExchange(ref _inProgressRequest, request, null))
{
// Wait for any previous processing to complete
Thread.Sleep(500);
}
// Process the incoming request
ImportFile(request);
Interlocked.Exchange(ref _inProgressRequest, null);
WaitingRequests.TryRemove(request.OperationId, out _);
}
public bool UploadAndImportIsComplete(Guid operationId) =>
!WaitingRequests.ContainsKey(operationId);
This is example client code.
private static async Task UploadFile(FileInfo fileInfo, ImportFileInfo importFileInfo)
{
using (var proxy = new Proxy())
using (var stream = new FileStream(fileInfo.FullName, FileMode.Open, FileAccess.Read))
{
importFileInfo.FileByteStream = stream;
proxy.UploadAndImport(importFileInfo);
}
await Task.Run(() => Poller.Poll(timeoutSeconds: 90, intervalSeconds: 1, func: () =>
{
using (var proxy = new Proxy())
{
return proxy.UploadAndImportIsComplete(importFileInfo.OperationId);
}
}));
}
It's hard to write a minimum viable example of this in a Fiddle, but here is a start that give a sense and that compiles.
As before, the above seems like a hack/kludge, and we are asking both about potential pitfalls in its approach and for alternative patterns that are more appropriate/canonical.
Simple solution using Producer-Consumer pattern to pipe requests in case of thread count restrictions.
You still have to implement a simple progress reporter or event. I suggest to replace the expensive polling approach with an asynchronous communication which is offered by Microsoft's SignalR library. It uses WebSocket to enable async behavior. The client and server can register their callbacks on a hub. Using RPC the client can now invoke server side methods and vice versa. You would post progress to the client by using the hub (client side). In my experience SignalR is very simple to use and very good documented. It has a library for all famous server side languages (e.g. Java).
Polling in my understanding is the totally opposite of fire-and-forget. You can't forget, because you have to check something based on an interval. Event based communication, like SignalR, is fire-and-forget since you fire and will get a reminder (cause you forgot). The "event side" will invoke your callback instead of you waiting to do it yourself!
Requirement 5 is ignored since I didn't get any reason. Waiting for a thread to complete would eliminate the fire and forget character.
private BlockingCollection<ImportFileInfo> requestQueue = new BlockingCollection<ImportFileInfo>();
private bool isServiceEnabled;
private readonly int maxNumberOfThreads = 8;
private Semaphore semaphore = new Semaphore(numberOfThreads);
private readonly object syncLock = new object();
public void UploadAndImport(ImportFileInfo request)
{
// Start the request handler background loop
if (!this.isServiceEnabled)
{
this.requestQueue?.Dispose();
this.requestQueue = new BlockingCollection<ImportFileInfo>();
// Fire and forget (requirement 4)
Task.Run(() => HandleRequests());
this.isServiceEnabled = true;
}
// Cache multiple incoming client requests (requirement 1) (and enable throttling)
this.requestQueue.Add(request);
}
private void HandleRequests()
{
while (!this.requestQueue.IsCompleted)
{
// Wait while thread limit is exceeded (some throttling)
this.semaphore.WaitOne();
// Process the incoming requests in a dedicated thread (requirement 2) until the BlockingCollection is marked completed.
Task.Run(() => ProcessRequest());
}
// Reset the request handler after BlockingCollection was marked completed
this.isServiceEnabled = false;
this.requestQueue.Dispose();
}
private void ProcessRequest()
{
ImportFileInfo request = this.requestQueue.Take();
UploadFile(request);
// You updated your question saying the method "ImportFile()" requires synchronization.
// This a bottleneck and will significantly drop performance, when this method is long running.
lock (this.syncLock)
{
ImportFile(request);
}
this.semaphore.Release();
}
Remarks:
BlockingCollection is a IDisposable
TODO: You have to "close" the BlockingCollection by marking it completed:
"BlockingCollection.CompleteAdding()" or it will loop indeterminately waiting for further requests. Maybe you introduce a additional request methods for the client to cancel and/ or to update the process and to mark adding to the BlockingCollection as completed. Or a timer that waits an idle time before marking it as completed. Or make your request handler thread block or spin.
Replace Take() and Add(...) with TryTake(...) and TryAdd(...) if you want cancellation support
Code is not tested
Your "ImportFile()" method is a bottleneck in your multi threading environment. I suggest to make it thread safe. In case of I/O that requires synchronization, I would cache the data in a BlockingCollection and then write them to I/O one by one.
The problem is that your total bandwidth is very small-- only one job can run at a time-- and you want to handle parallel requests. That means that queue time could vary wildly. It may not be the best choice to implement your job queue in-memory, as it would make your system much more brittle, and more difficult to scale out when your business grows.
A traditional, scaleable way to architect this would be:
An HTTP service to accept requests, load balanced/redundant, with no session state.
A SQL Server database to persist the requests in a queue, returning a persistent unique job ID.
A Windows service to process the queue, one job at a time, and mark jobs as complete. The worker process for the service would probably be single-threaded.
This solution requires you to choose a web server. A common choice is IIS running ASP.NET. On that platform, each request is guaranteed to be handled in a single-threaded manner (i.e. you don't need to worry about race conditions too much), but due to a feature called thread agility the request might end with a different thread, but in the original synchronization context, which means you will probably never notice unless you are debugging and inspecting thread IDs.
Given the constraints context of our system, this is the implementation we ended up using:
static ImportFileInfo _importInProgressItem = null;
static readonly ConcurrentQueue<ImportFileInfo> ImportQueue =
new ConcurrentQueue<ImportFileInfo>();
public void UploadAndImport(ImportFileInfo request) {
UploadFile(request);
ImportFileSynchronized(request);
}
// Synchronize the file import,
// because the database allows a user to perform only one write at a time.
private void ImportFileSynchronized(ImportFileInfo request) {
ImportQueue.Enqueue(request);
do {
ImportQueue.TryPeek(out var next);
if (null != Interlocked.CompareExchange(ref _importInProgressItem, next, null)) {
// Queue processing is already under way in another thread.
return;
}
ImportFile(next);
ImportQueue.TryDequeue(out _);
Interlocked.Exchange(ref _importInProgressItem, null);
}
while (ImportQueue.Any());
}
public bool UploadAndImportIsComplete(Guid operationId) =>
ImportQueue.All(waiting => waiting.OperationId != operationId);
This solution works well for the loads we are expecting. That load involves a maximum of about 15-20 concurrent PDF file uploads. The batch of up to 15-20 files tends to arrive all at once and then to go quiet for several hours until the next batch arrives.
Criticism and feedback is most welcome.

Reduce CPU usage while processing large amount of data

I am writing a real time application which receives around 2000 messages per second which was pushed in a queue. I have written a background thread which process the messages in the queue.
private void ProcessSocketMessage()
{
while (!this.shouldStopProcessing)
{
while (this.messageQueue.Count > 0)
{
string message;
bool result = this.messageQueue.TryDequeue(out message);
if (result)
{
// Process the string and do some other stuff
// Like updating the received message in a datagrid
}
}
}
}
The problem with the above code is that it uses insane amount of processing power around 12% of CPU(2.40 GHz dual core processor).
I have 4 blocks similar to the one above which literally takes up 50 % of CPU computing power.
Is there anything which can be optimized in the above code?
Adding a Thread Sleep of 100 ms before second while loop end does seems to be increase the performance by 50%. But am I doing something wrong?
This functionality is already provided in the Dataflow library's ActionBlock class. An ActionBlock has an input buffer that receives messages and processes them by calling an action for each one. By default, only one message is processed at a time. It doesn't use busy waiting.
void MyActualProcessingMethod(string it)
{
// Process the string and do some other stuff
}
var myBlock = new ActionBlock<string>( someString =>MyActualProcessingMethod(someString));
//Simulate a lot of messages
for(int i=0;i<100000;i++)
{
myBlock.Post(someMessage);
}
When the messages finish and/or we don't want any more messages, we command it to complete, by refusing any new messages and processing anything left in the input buffer:
myBlock.Complete();
Before we finish, we need to actually await for the block to finish processing the leftovers:
await myBlock.Completion;
All Dataflow blocks can accept messages from multiple clients.
Blocks can be combined as well. The output of one block can feed another. The TransformBlock accepts a function that transforms an input into an output.
Typically each block uses tasks from the thread pool. By default one block processes only one message at a time. Different blocks run on different tasks or even different TaskSchedulers. This way, you can have one block do some heavy processing and push a result to another block that updates the UI.
string MyActualProcessingMethod(string it)
{
// Process the string and do some other stuff
// and send a progress message downstream
return SomeProgressMessage;
}
void UpdateTheUI(string msg)
{
statusBar1.Text = msg;
}
var myProcessingBlock = new TransformBlock<string,string>(msg =>MyActualProcessingMethod(msg));
The UI will be updated by another block that runs on the UI thread. This is expressed through the ExecutionDataflowBlockOptions :
var runOnUI=new ExecutionDataflowBlockOptions {
TaskScheduler = TaskScheduler.FromCurrentSynchronizationContext()
};
var myUpdater = new ActionBlock<string>(msg => UpdateTheUI(msg),runOnUI);
//Pass progress messages from the processor to the updater
myProcessingBlock.LinkTo(myUpdater,new DataflowLinkOptions { PropagateCompletion = true });
The code that posts messages to the pipeline's first block doesn't change :
//Simulate a lot of messages
for(int i=0;i<100000;i++)
{
myProcessingBlock.Post(someMessage);
}
//We are finished, tell the block to process any leftover messages
myProcessingBlock.Complete();
In this case, as soon as the procesor completes it will notify the next block in the pipeline to complete. We need to wait for that final block to complete as well
//Wait for the block to finish
await myUpdater.Completion;
How about making the first block work in parallel? We can specify that up to eg 10 tasks will be used to process input messages through its execution options :
var dopOptions = new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 10};
var myProcessingBlock = new TransformBlock<string,string>(msg =>MyActualProcessingMethod(msg),dopOptions);
The processor will process up to 10 messages in parallel but the updater will still process them one by one, in the UI thread.
You're best bet is to use a profile to monitor the running application and determine for sure where the CPU is spending it's time.
However, it looks like you have the possibility for a busy-wait loop if this.messageQueue.Count is 0. At minimum, I would suggest adding a small pause if the queue is empty to allow a message to go onto the queue. Otherwise your CPU is just spending time checking the queue over and over and over.
If the time is spent dequeueing messages, you may want to consider handling multiple messages at once (if there are multiple messages available), assuming you're queue allows you to pop multiple messages off the queue in a single call.

C# queueing dependant tasks to be processed by a thread pool

I want to queue dependant tasks across several flows that need to be processed in order (in each flow). The flows can be processed in parallel.
To be specific, let's say I need two queues and I want the tasks in each queue to be processed in order. Here is sample pseudocode to illustrate the desired behavior:
Queue1_WorkItem wi1a=...;
enqueue wi1a;
... time passes ...
Queue1_WorkItem wi1b=...;
enqueue wi1b; // This must be processed after processing of item wi1a is complete
... time passes ...
Queue2_WorkItem wi2a=...;
enqueue wi2a; // This can be processed concurrently with the wi1a/wi1b
... time passes ...
Queue1_WorkItem wi1c=...;
enqueue wi1c; // This must be processed after processing of item wi1b is complete
Here is a diagram with arrows illustrating dependencies between work items:
The question is how do I do this using C# 4.0/.NET 4.0? Right now I have two worker threads, one per queue and I use a BlockingCollection<> for each queue. I would like to instead leverage the .NET thread pool and have worker threads process items concurrently (across flows), but serially within a flow. In other words I would like to be able to indicate that for example wi1b depends on completion of wi1a, without having to track completion and remember wi1a, when wi1b arrives. In other words, I just want to say, "I want to submit a work item for queue1, which is to be processed serially with other items I have already submitted for queue1, but possibly in parallel with work items submitted to other queues".
I hope this description made sense. If not please feel free to ask questions in the comments and I will update this question accordingly.
Thanks for reading.
Update:
To summarize "flawed" solutions so far, here are the solutions from the answers section that I cannot use and the reason(s) why I cannot use them:
TPL tasks require specifying the antecedent task for a ContinueWith(). I do not want to maintain knowledge of each queue's antecedent task when submitting a new task.
TDF ActionBlocks looked promising, but it would appear that items posted to an ActionBlock are processed in parallel. I need for the items for a particular queue to be processed serially.
Update 2:
RE: ActionBlocks
It would appear that setting the MaxDegreeOfParallelism option to one prevents parallel processing of work items submitted to a single ActionBlock. Therefore it seems that having an ActionBlock per queue solves my problem with the only disadvantage being that this requires the installation and deployment of the TDF library from Microsoft and I was hoping for a pure .NET 4.0 solution. So far, this is the candidate accepted answer, unless someone can figure out a way to do this with a pure .NET 4.0 solution that doesn't degenerate to a worker thread per queue (which I am already using).
I understand you have many queues and don't want to tie up threads. You could have an ActionBlock per queue. The ActionBlock automates most of what you need: It processes work items serially, and only starts a Task when work is pending. When no work is pending, no Task/Thread is blocked.
The best way is to use the Task Parallel Library (TPL) and Continuations. A continuation not only allows you to create a flow of tasks but also handles your exceptions. This is a great introduction to the TPL. But to give you some idea...
You can start a TPL task using
Task task = Task.Factory.StartNew(() =>
{
// Do some work here...
});
Now to start a second task when an antecedent task finishes (in error or successfully) you can use the ContinueWith method
Task task1 = Task.Factory.StartNew(() => Console.WriteLine("Antecedant Task"));
Task task2 = task1.ContinueWith(antTask => Console.WriteLine("Continuation..."));
So as soon as task1 completes, fails or is cancelled task2 'fires-up' and starts running. Note that if task1 had completed before reaching the second line of code task2 would be scheduled to execute immediately. The antTask argument passed to the second lambda is a reference to the antecedent task. See this link for more detailed examples...
You can also pass continuations results from the antecedent task
Task.Factory.StartNew<int>(() => 1)
.ContinueWith(antTask => antTask.Result * 4)
.ContinueWith(antTask => antTask.Result * 4)
.ContinueWith(antTask =>Console.WriteLine(antTask.Result * 4)); // Prints 64.
Note. Be sure to read up on exception handling in the first link provided as this can lead a newcomer to TPL astray.
One last thing to look at in particular for what you want is child tasks. Child tasks are those which are created as AttachedToParent. In this case the continuation will not run until all child tasks have completed
TaskCreationOptions atp = TaskCreationOptions.AttachedToParent;
Task.Factory.StartNew(() =>
{
Task.Factory.StartNew(() => { SomeMethod() }, atp);
Task.Factory.StartNew(() => { SomeOtherMethod() }, atp);
}).ContinueWith( cont => { Console.WriteLine("Finished!") });
I hope this helps.
Edit: Have you had a look at ConcurrentCollections in particular the BlockngCollection<T>. So in your case you might use something like
public class TaskQueue : IDisposable
{
BlockingCollection<Action> taskX = new BlockingCollection<Action>();
public TaskQueue(int taskCount)
{
// Create and start new Task for each consumer.
for (int i = 0; i < taskCount; i++)
Task.Factory.StartNew(Consumer);
}
public void Dispose() { taskX.CompleteAdding(); }
public void EnqueueTask (Action action) { taskX.Add(Action); }
void Consumer()
{
// This seq. that we are enumerating will BLOCK when no elements
// are avalible and will end when CompleteAdding is called.
foreach (Action action in taskX.GetConsumingEnumerable())
action(); // Perform your task.
}
}
A .NET 4.0 solution based on TPL is possible, while hiding away the fact that it needs to store the parent task somewhere. For example:
class QueuePool
{
private readonly Task[] _queues;
public QueuePool(int queueCount)
{ _queues = new Task[queueCount]; }
public void Enqueue(int queueIndex, Action action)
{
lock (_queues)
{
var parent = _queue[queueIndex];
if (parent == null)
_queues[queueIndex] = Task.Factory.StartNew(action);
else
_queues[queueIndex] = parent.ContinueWith(_ => action());
}
}
}
This is using a single lock for all queues, to illustrate the idea. In production code, however, I would use a lock per queue to reduce contention.
It looks like the design you already have is good and working. Your worker threads (one per queue) are long-running so if you want to use Task's instead, specify TaskCreationOptions.LongRunning so you get a dedicated worker thread.
But there isn't really a need to use the ThreadPool here. It doesn't offer many benefits for long-running work.

How to ensure order of async operation calls?

[This appears to be a loooong question but I have tried to make it as clear as possible. Please have patience and help me...]
I have written a test class which supports an Async operation. That operation does nothing but reports 4 numbers:
class AsyncDemoUsingAsyncOperations
{
AsyncOperation asyncOp;
bool isBusy;
void NotifyStarted () {
isBusy = true;
Started (this, new EventArgs ());
}
void NotifyStopped () {
isBusy = false;
Stopped (this, new EventArgs ());
}
public void Start () {
if (isBusy)
throw new InvalidOperationException ("Already working you moron...");
asyncOp = AsyncOperationManager.CreateOperation (null);
ThreadPool.QueueUserWorkItem (new WaitCallback (StartOperation));
}
public event EventHandler Started = delegate { };
public event EventHandler Stopped = delegate { };
public event EventHandler<NewNumberEventArgs> NewNumber = delegate { };
private void StartOperation (object state) {
asyncOp.Post (args => NotifyStarted (), null);
for (int i = 1; i < 5; i++)
asyncOp.Post (args => NewNumber (this, args as NewNumberEventArgs), new NewNumberEventArgs (i));
asyncOp.Post (args => NotifyStopped (), null);
}
}
class NewNumberEventArgs: EventArgs
{
public int Num { get; private set; }
public NewNumberEventArgs (int num) {
Num = num;
}
}
Then I wrote 2 test programs; one as console app and another as windows form app. Windows form app works as expected when I call Start repeatedly:
But console app has hard time ensuring the order:
Since I am working on class library, I have to ensure that my library works correctly in all app models (Haven't tested in ASP.NET app yet). So I have following questions:
I have tested enough times and it appears to be working but is it OK to assume above code will always work in windows form app?
Whats the reason it (order) doesn't work correctly in console app? How can I fix it?
Not much experienced with ASP.NET. Will the order work in ASP.NET app?
[EDIT: Test stubs can be seen here if that helps.]
Unless I am missing something then given the code above I believe there is no way of guaranteeing the order of execution. I have never used the AsyncOperation and AsyncOperationManager class but I looked in reflector and as could be assumed AsyncOperation.Post uses the thread pool to execute the given code asynchronously.
This means that in the example you have provided 4 tasks will be queued to the thread pool synchronously in very quick succession. The thread pool will then dequeue the tasks in FIFO order (first in first out) but it's entirely possible for one of later threads to be scheduled before an earlier one or one of the later threads to complete before an earlier thread has completed its work.
Therefore given what you have there is no way to control the order in the way you desire. There are ways to do this, a good place to look is this article on MSDN.
http://msdn.microsoft.com/en-us/magazine/dd419664.aspx
I use a Queue you can then Enqueue stuff and Dequeue stuff in the correct order. This solved this problem for me.
The documentation for AsyncOperation.Post states:
Console applications do not synchronize the execution of Post calls. This can cause ProgressChanged events to be raised out of order. If you wish to have serialized execution of Post calls, implement and install a System.Threading.SynchronizationContext class.
I think this is the exact behavior you’re seeing. Basically, if the code that wants to subscribe to notifications from your asynchronous event wants to receive the updates in order, it must ensure that there is a synchronization context installed and that your AsyncOperationManager.CreateOperation() call is run inside of that context. If the code consuming the asynchronous events doesn’t care about receiving them in the correct order, it simply needs to avoid installing a synchronization context which will result in the “default” context being used (which just queues calls directly to the threadpool which may execute them in any order it wants to).
In the GUI version of your application, if you call your API from a UI thread, you will automatically have a synchronization context. This context is wired up to use the UI’s message queueing system which guarantees that posted messages are processed in order and serially (i.e., not concurrently).
In a Console application, unless if you manually install your own synchronization context, you will be using the default, non-synchronizing threadpool version. I am not exactly sure, but I don’t think that .net makes installing a serializing synchronization context very easy to do. I just use Nito.AsyncEx.AsyncContext from the Nito.AsyncEx nuget package to do that for me. Basically, if you call Nito.AsyncEx.AsyncContext.Run(MyMethod), it will capture the current thread and run an event loop with MyMethod as the first “handler” in that event loop. If MyMethod calls something that creates an AsyncOperation, that operation increments an “ongoing operations” counter and that loop will continue until the operation is completed via AsyncOperation.PostOperationCompleted or AsyncOperation.OperationCompleted. Just like the synchronization context provided by a UI thread, AsyncContext will queue posts from AsyncOperation.Post() in the order it receives them and run them serially in its event loop.
Here is an example of how to use AsyncContext with your demo asynchronous operation:
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Starting SynchronizationContext");
Nito.AsyncEx.AsyncContext.Run(Run);
Console.WriteLine("SynchronizationContext finished");
}
// This method is run like it is a UI callback. I.e., it has a
// single-threaded event-loop-based synchronization context which
// processes asynchronous callbacks.
static Task Run()
{
var remainingTasks = new Queue<Action>();
Action startNextTask = () =>
{
if (remainingTasks.Any())
remainingTasks.Dequeue()();
};
foreach (var i in Enumerable.Range(0, 4))
{
remainingTasks.Enqueue(
() =>
{
var demoOperation = new AsyncDemoUsingAsyncOperations();
demoOperation.Started += (sender, e) => Console.WriteLine("Started");
demoOperation.NewNumber += (sender, e) => Console.WriteLine($"Received number {e.Num}");
demoOperation.Stopped += (sender, e) =>
{
// The AsyncDemoUsingAsyncOperation has a bug where it fails to call
// AsyncOperation.OperationCompleted(). Do that for it. If we don’t,
// the AsyncContext will never exit because there are outstanding unfinished
// asynchronous operations.
((AsyncOperation)typeof(AsyncDemoUsingAsyncOperations).GetField("asyncOp", BindingFlags.NonPublic|BindingFlags.Instance).GetValue(demoOperation)).OperationCompleted();
Console.WriteLine("Stopped");
// Start the next task.
startNextTask();
};
demoOperation.Start();
});
}
// Start the first one.
startNextTask();
// AsyncContext requires us to return a Task because that is its
// normal use case.
return Nito.AsyncEx.TaskConstants.Completed;
}
}
With output:
Starting SynchronizationContext
Started
Received number 1
Received number 2
Received number 3
Received number 4
Stopped
Started
Received number 1
Received number 2
Received number 3
Received number 4
Stopped
Started
Received number 1
Received number 2
Received number 3
Received number 4
Stopped
Started
Received number 1
Received number 2
Received number 3
Received number 4
Stopped
SynchronizationContext finished
Note that in my example code I work around a bug in AsyncDemoUsingAsyncOperations which you should probably fix: when your operation stops, you never call AsyncOperation.OperationCompleted or AsyncOperation.PostOperationCompleted. This causes AsyncContext.Run() to hang forever because it is waiting for the outstanding operations to complete. You should make sure that your asynchronous operations complete—even in error cases. Otherwise you might run into similar issues elsewhere.
Also, my demo code, to imitate the output you showed in the winforms and console example, waits for each operation to finish before starting the next one. That kind of defeats the point of asynchronous coding. You can probably tell that my code could be greatly simplified by starting all four tasks at once. Each individual task would receive its callbacks in the correct order, but they would all make progress concurrently.
Recommendation
Because of how AsyncOperation seems to work and how it is intended to be used, it is the responsibility of the caller of an asynchronous API that uses this pattern to decide if it wants to receive events in order or not. If you are going to use AsyncOperation, you should document that the asynchronous events will only be received in order by the caller if the caller has a synchronization context that enforces serialization and suggest that the caller call your API on either a UI thread or in something like AsyncContext.Run(). If you try to use synchronization primitives and whatnot inside of the delegate you call with AsyncOperation.Post(), you could end up putting threadpool threads in a sleeping state which is a bad thing when considering performance and is completely redundant/wasteful when the caller of your API has properly set up a synchronization context already. This also enables the caller to decide that, if it is fine with receiving things out of order, that it is willing to process events concurrently and out of order. That may even enable speedup depending on what you’re doing. Or you might even decide to put something like a sequence number in your NewNumberEventArgs in case the caller wants both concurrency and still needs to be able to assemble the events into order at some point.

Categories

Resources