C# Implementing Producer/Consumer Queue In a Web Service - c#

I have a fairly vanilla web service (old school asmx). One of the methods kicks off some async processing that has no bearing on the result returned to the client. Hopefully, the little snippet below makes sense:
[System.Web.Services.WebMethod]
public List<Foo> SampleWebMethod(string id)
{
// sample db query
var foo = db.Query<Foo>("WHERE id=#0",id);
// kick of async stuff here - for example firing off emails
// dont wait to send result
DoAsyncStuffHere();
return foo;
}
My initial implementation for the DoAsyncStuffHere method made use of the ThreadPool.QueueUserWorkItem. So, it looks something like:
public void DoAsyncStuffHere()
{
ThreadPool.QueueUserWorkItem(delegate
{
// DO WORK HERE
});
}
This approach works fine under low load conditions. However, I need something that can handle a fairly high load. So, the producer/consumer pattern would seem to be the best way to go.
Where I am confused is how to constrain all work being done by the queue to a single thread across all instances of the web service. How would I best go about setting up a single queue to be accessed by any instance of the web service?

You can use a System.Collections.Concurrent.BlockingCollection<T> with a System.Collections.Concurrent.ConcurrentQueue<T> as the underlying collection.
As the name of the namespace implies, the collections are thread safe.
Start a consumer thread (or a few) to pull items from the collection, using the Take() method. When no items are available, the thread will block.
Your DoAsyncStuffHere method adds items to the BlockingCollection. These items could be unstarted System.Threading.Tasks.Task objects; the consumer thread(s) would in that case Start the tasks after taking them from the collection.

One easy way to do it would be to implement your queue as a database table.
The producers would be the request threads handled by each instance of the web service.
The consumer could be any kind of continuously running process (Windows Forms app, Windows service, database job, etc.) that monitors the queue and processes items one at a time.

You can't do this with ThreadPool - you could have a static constructor which launches a worker Thread; the DoAsyncStuffHere could insert its work item to a Queue of work that you want done, and the worker Thread can check if there are any items in the Queue to work on. If so, it does the work, otherwise it sleeps for a few millis.
The static constructor ensures that it's only called once, and only a single Thread should be launched (unless there's some bizarre .NET edge case that I'm unaware of).
Here's a layout for an example - you'd probably need to implement some locking on the queue and add a bit more sophistication to the worker thread, but I've used this pattern before with success. The WorkItem object can hold the state of the information that you want passed along to the worker thread.
public static WebService()
{
new Thread(WorkerThread).Start();
WorkQueue = new Queue<WorkItem>();
}
public static void WorkerThread()
{
while(true)
{
if(WorkQueue.Any())
{
WorkQueue.Dequeue().DoWork();
}
else
{
Thread.Sleep(100);
}
}
}
public static Queue<WorkItem> WorkQueue { get; set; }
[System.Web.Services.WebMethod]
public List<Foo> SampleWebMethod(string id)
{
WorkQueue.Queue(newWorkItem());
}

Related

Receive concurrent asynchronous requests and process them one at a time

Background
We have a service operation that can receive concurrent asynchronous requests and must process those requests one at a time.
In the following example, the UploadAndImport(...) method receives concurrent requests on multiple threads, but its calls to the ImportFile(...) method must happen one at a time.
Layperson Description
Imagine a warehouse with many workers (multiple threads). People (clients) can send the warehouse many packages (requests) at the same time (concurrently). When a package comes in a worker takes responsibility for it from start to finish, and the person who dropped off the package can leave (fire-and-forget). The workers' job is to put each package down a small chute, and only one worker can put a package down a chute at a time, otherwise chaos ensues. If the person who dropped off the package checks in later (polling endpoint), the warehouse should be able to report on whether the package went down the chute or not.
Question
The question then is how to write a service operation that...
can receive concurrent client requests,
receives and processes those requests on multiple threads,
processes requests on the same thread that received the request,
processes requests one at a time,
is a one way fire-and-forget operation, and
has a separate polling endpoint that reports on request completion.
We've tried the following and are wondering two things:
Are there any race conditions that we have not considered?
Is there a more canonical way to code this scenario in C#.NET with a service oriented architecture (we happen to be using WCF)?
Example: What We Have Tried?
This is the service code that we have tried. It works though it feels like somewhat of a hack or kludge.
static ImportFileInfo _inProgressRequest = null;
static readonly ConcurrentDictionary<Guid, ImportFileInfo> WaitingRequests =
new ConcurrentDictionary<Guid, ImportFileInfo>();
public void UploadAndImport(ImportFileInfo request)
{
// Receive the incoming request
WaitingRequests.TryAdd(request.OperationId, request);
while (null != Interlocked.CompareExchange(ref _inProgressRequest, request, null))
{
// Wait for any previous processing to complete
Thread.Sleep(500);
}
// Process the incoming request
ImportFile(request);
Interlocked.Exchange(ref _inProgressRequest, null);
WaitingRequests.TryRemove(request.OperationId, out _);
}
public bool UploadAndImportIsComplete(Guid operationId) =>
!WaitingRequests.ContainsKey(operationId);
This is example client code.
private static async Task UploadFile(FileInfo fileInfo, ImportFileInfo importFileInfo)
{
using (var proxy = new Proxy())
using (var stream = new FileStream(fileInfo.FullName, FileMode.Open, FileAccess.Read))
{
importFileInfo.FileByteStream = stream;
proxy.UploadAndImport(importFileInfo);
}
await Task.Run(() => Poller.Poll(timeoutSeconds: 90, intervalSeconds: 1, func: () =>
{
using (var proxy = new Proxy())
{
return proxy.UploadAndImportIsComplete(importFileInfo.OperationId);
}
}));
}
It's hard to write a minimum viable example of this in a Fiddle, but here is a start that give a sense and that compiles.
As before, the above seems like a hack/kludge, and we are asking both about potential pitfalls in its approach and for alternative patterns that are more appropriate/canonical.
Simple solution using Producer-Consumer pattern to pipe requests in case of thread count restrictions.
You still have to implement a simple progress reporter or event. I suggest to replace the expensive polling approach with an asynchronous communication which is offered by Microsoft's SignalR library. It uses WebSocket to enable async behavior. The client and server can register their callbacks on a hub. Using RPC the client can now invoke server side methods and vice versa. You would post progress to the client by using the hub (client side). In my experience SignalR is very simple to use and very good documented. It has a library for all famous server side languages (e.g. Java).
Polling in my understanding is the totally opposite of fire-and-forget. You can't forget, because you have to check something based on an interval. Event based communication, like SignalR, is fire-and-forget since you fire and will get a reminder (cause you forgot). The "event side" will invoke your callback instead of you waiting to do it yourself!
Requirement 5 is ignored since I didn't get any reason. Waiting for a thread to complete would eliminate the fire and forget character.
private BlockingCollection<ImportFileInfo> requestQueue = new BlockingCollection<ImportFileInfo>();
private bool isServiceEnabled;
private readonly int maxNumberOfThreads = 8;
private Semaphore semaphore = new Semaphore(numberOfThreads);
private readonly object syncLock = new object();
public void UploadAndImport(ImportFileInfo request)
{
// Start the request handler background loop
if (!this.isServiceEnabled)
{
this.requestQueue?.Dispose();
this.requestQueue = new BlockingCollection<ImportFileInfo>();
// Fire and forget (requirement 4)
Task.Run(() => HandleRequests());
this.isServiceEnabled = true;
}
// Cache multiple incoming client requests (requirement 1) (and enable throttling)
this.requestQueue.Add(request);
}
private void HandleRequests()
{
while (!this.requestQueue.IsCompleted)
{
// Wait while thread limit is exceeded (some throttling)
this.semaphore.WaitOne();
// Process the incoming requests in a dedicated thread (requirement 2) until the BlockingCollection is marked completed.
Task.Run(() => ProcessRequest());
}
// Reset the request handler after BlockingCollection was marked completed
this.isServiceEnabled = false;
this.requestQueue.Dispose();
}
private void ProcessRequest()
{
ImportFileInfo request = this.requestQueue.Take();
UploadFile(request);
// You updated your question saying the method "ImportFile()" requires synchronization.
// This a bottleneck and will significantly drop performance, when this method is long running.
lock (this.syncLock)
{
ImportFile(request);
}
this.semaphore.Release();
}
Remarks:
BlockingCollection is a IDisposable
TODO: You have to "close" the BlockingCollection by marking it completed:
"BlockingCollection.CompleteAdding()" or it will loop indeterminately waiting for further requests. Maybe you introduce a additional request methods for the client to cancel and/ or to update the process and to mark adding to the BlockingCollection as completed. Or a timer that waits an idle time before marking it as completed. Or make your request handler thread block or spin.
Replace Take() and Add(...) with TryTake(...) and TryAdd(...) if you want cancellation support
Code is not tested
Your "ImportFile()" method is a bottleneck in your multi threading environment. I suggest to make it thread safe. In case of I/O that requires synchronization, I would cache the data in a BlockingCollection and then write them to I/O one by one.
The problem is that your total bandwidth is very small-- only one job can run at a time-- and you want to handle parallel requests. That means that queue time could vary wildly. It may not be the best choice to implement your job queue in-memory, as it would make your system much more brittle, and more difficult to scale out when your business grows.
A traditional, scaleable way to architect this would be:
An HTTP service to accept requests, load balanced/redundant, with no session state.
A SQL Server database to persist the requests in a queue, returning a persistent unique job ID.
A Windows service to process the queue, one job at a time, and mark jobs as complete. The worker process for the service would probably be single-threaded.
This solution requires you to choose a web server. A common choice is IIS running ASP.NET. On that platform, each request is guaranteed to be handled in a single-threaded manner (i.e. you don't need to worry about race conditions too much), but due to a feature called thread agility the request might end with a different thread, but in the original synchronization context, which means you will probably never notice unless you are debugging and inspecting thread IDs.
Given the constraints context of our system, this is the implementation we ended up using:
static ImportFileInfo _importInProgressItem = null;
static readonly ConcurrentQueue<ImportFileInfo> ImportQueue =
new ConcurrentQueue<ImportFileInfo>();
public void UploadAndImport(ImportFileInfo request) {
UploadFile(request);
ImportFileSynchronized(request);
}
// Synchronize the file import,
// because the database allows a user to perform only one write at a time.
private void ImportFileSynchronized(ImportFileInfo request) {
ImportQueue.Enqueue(request);
do {
ImportQueue.TryPeek(out var next);
if (null != Interlocked.CompareExchange(ref _importInProgressItem, next, null)) {
// Queue processing is already under way in another thread.
return;
}
ImportFile(next);
ImportQueue.TryDequeue(out _);
Interlocked.Exchange(ref _importInProgressItem, null);
}
while (ImportQueue.Any());
}
public bool UploadAndImportIsComplete(Guid operationId) =>
ImportQueue.All(waiting => waiting.OperationId != operationId);
This solution works well for the loads we are expecting. That load involves a maximum of about 15-20 concurrent PDF file uploads. The batch of up to 15-20 files tends to arrive all at once and then to go quiet for several hours until the next batch arrives.
Criticism and feedback is most welcome.

Understanding fire and forget when using infinite loops

Can someone tell me what the best practice/proper way of doing this is?
I'm also using WPF, not a console or ASP.NET.
Using Listener to accept clients and spin off a new "thread" for each client that handles all the I/O and Exception catching for that client.
Method 1: Fire and forget, and just throw it into a variable to get rid of the warning.
public static async Task Start(CancellationToken token)
{
m_server = TcpListener.Create(33777);
m_server.Start();
running = true;
clientCount = 0;
// TODO: Add try... catch
while (!token.IsCancellationRequested)
{
var client = await m_server.AcceptTcpClientAsync().ConfigureAwait(false);
Client c = new Client(client);
var _ = HandleClientAsync(c);
}
}
Here's the Client Handler code:
public static async Task HandleClientAsync(Client c)
{
// TODO: add try...catch
while (c.connected)
{
string data = await c.reader.ReadLineAsync();
// Now we will parse the data and update variables accordingly
// Just Regex and some parsing that updates variables
ParseAndUpdate(data);
}
}
Method 2: The same thing... but with Task.Run()
var _ = Task.Run(() => HandleClientAsync());
Method 3: an intermediate non async function (doubt this is good. Should be Async all the way)
But this at least gets rid of the squiggly line without using the variable trick which kinda feels dirty.
while (!token.IsCancellationRequested)
{
var client = await m_server.AcceptTcpClientAsync().ConfigureAwait(false);
Client c = new Client(client);
NonAsync(c);
}
public static void NonAsync(VClient vc)
{
Task.Run(() => HandleClientAsync(vc));
}
Method 4: Make HandleClientAsync an Async void instead of Async Task (really bad)
public static async Task HandleClientAsync(Client c)
// Would change to
public static async Void HandleClientAsync(Client c)
Questions:
Is it any better to use Task.Run() When doing a fire and forget task?
Is it just accepted that you need to use the var _ = FireAndForget() trick to do fire and forget? I could just ignore the warning but something feels wrong about it.
If I wanted to update my UI from a Client, how would I do that? Would I just use a dispatcher?
Thanks guys
I've never been a fan of background workers which you expect to run for a long time, being run in a task. Tasks get scheduled to run on threads drawn from a pool. As you schedule these long running tasks, the thread pool gets smaller and smaller. Eventually all of the threads from the pool are busy running your tasks, and things get really slow and unmanageable.
My recommendation here? Use the Thread class and manage them yourselves. In this way, you keep your thread pool and the overhead for for tasks out of the picture.
Addendum - Producer Consumer Model
Another interesting question to consider: Do you really need a thread for every client? Threads are reasonably costly to create and maintain in terms of memory overhead, and if your client interaction is such that the client threads spend the vast majority of their time waiting around on something to do, then perhaps a producer consumer model is more suited to your use case.
Example:
Client connects on listening thread, gets put in a client queue
Worker thread responsible for checking to see if the clients need anything comes along through that queue every so often and checks - does the client have a new message to service? If so, it services all messages the client has, then moves on
In this way, you limit the number of threads working to just the number needed to manage the message queue. You can even get fancy and add worker threads dynamically based on how long its been since all the clients have been serviced.
If you insist
If you really like what you have going, I suggest refactoring what youre doing a bit so that rather than HandleClientAsync you do something more akin to CreateServiceForClient(c);
This could be a synchronous method that returns something like a ClientService. ClientService could then create the task that does what your HandleClientAsync does now, and store that task as a member. It could also provide methods like
ClientService.WaitUntilEnd()
and
ClientService.Disconnect() (which could set a cancellation token, also stored as a member variable)

create multiple threads and communicate with them

I have a program, that takes long time to initialize but it's execution is rather fast.
It's becoming a bottleneck, so I want to start multiple instances of this program (like a pool) having it already initialized, and the idea is to just pass the needed arguments for it's execution, saving all the initialization time.
The problem is that I only found howto start new processes passing arguments:
How to pass parameters to ThreadStart method in Thread?
but I would like to start the process normally and then be able to communicate with it to send each thread the needed paramenters required for it's execution.
The best aproach I found was to create multiple threads where I would initialize the program and then using some communication mechanism (named pipes for example as it's all running in the same machine) be able to pass those arguments and trigger the execution of the program (one of the triggers could break an infinite loop, for example).
I'm asking if anyone can advice a more optimal solution rather that the one I came up with.
I suggest you don't mess with direct Thread usage, and use the TPL, something like this:
foreach (var data in YOUR_INITIALIZATION_LOGIC_METHOD_HERE)
{
Task.Run(() => yourDelegate(data), //other params here);
}
More about Task.Run on MSDN, Stephen Cleary blog
Process != Thread
A thread lives inside a process, while a process is an entire program or service in your OS.
If you want to speed-up your app initialization you can still use threads, but nowadays we use them on top of Task Parallel Library using the Task Asynchronous Pattern.
In order to communicate tasks (usually threads), you might need to implement some kind of state machine (some kind of basic workflow) where you can detect when some task progress and perform actions based on task state (running, failed, completed...).
Anyway, you don't need named pipes or something like that to communicate tasks/threads as everything lives in the same parent process. That is, you need to use regular programming approaches to do so. I mean: use C# and thread synchronization mechanisms and some kind of in-app messaging.
Some very basic idea...
.NET has a List<T> collection class. You should design a coordinator class where you might add some list which receives a message class (designed by you) like this:
public enum OperationType { DataInitialization, Authentication, Caching }
public class Message
{
public OperationType Operation { get; set; }
public Task Task { get; set; }
}
And you start all parallel initialization tasks, you add everyone to a list in the coordinator class:
Coordinator.Messages.AddRange
(
new List<Message>
{
new Message
{
Operation = Operation.DataInitialization,
Task = dataInitTask
},
..., // <--- more messages
}
);
Once you've added all messages with pending initialization tasks, somewhere in your code you can wait until initialization ends asynchronously this way:
// You do a projection of each message to get an IEnumerable<Task>
// to give it as argument of Task.WhenAll
await Task.WhenAll(Coordinator.Messages.Select(message => message.Task));
While this line awaits to finish all initialization, your UI (i.e. the main thread) can continue to work and show some kind of loading animation or who knows what (whatever).
Perhaps you can go a step further, and don't wait for all but wait for a group of tasks which allow your users to start using your app, while other non-critical tasks end...

What is the most efficient method for assigning threads based on the following scenario?

I can have a maximum of 5 threads running simultaneous at any one time which makes use of 5 separate hardware to speedup the computation of some complex calculations and return the result. The API (contains only one method) for each of this hardware is not thread safe and can only run on a single thread at any point in time. Once the computation is completed, the same thread can be re-used to start another computation on either the same or a different hardware depending on availability. Each computation is stand alone and does not depend on the results of the other computation. Hence, up to 5 threads may complete its execution in any order.
What is the most efficient C# (using .Net Framework 2.0) coding solution for keeping track of which hardware is free/available and assigning a thread to the appropriate hardware API for performing the computation? Note that other than the limitation of 5 concurrently running threads, I do not have any control over when or how the threads are fired.
Please correct me if I am wrong but a lock free solution is preferred as I believe it will result in increased efficiency and a more scalable solution.
Also note that this is not homework although it may sound like it...
.NET provides a thread pool that you can use. System.Threading.ThreadPool.QueueUserWorkItem() tells a thread in the pool to do some work for you.
Were I designing this, I'd not focus on mapping threads to your HW resources. Instead I'd expose a lockable object for each HW resource - this can simply be an array or queue of 5 Objects. Then for each bit of computation you have, call QueueUserWorkItem(). Inside the method you pass to QUWI, find the next available lockable object and lock it (aka, dequeue it). Use the HW resource, then re-enqueue the object, exit the QUWI method.
It won't matter how many times you call QUWI; there can be at most 5 locks held, each lock guards access to one instance of your special hardware device.
The doc page for Monitor.Enter() shows how to create a safe (blocking) Queue that can be accessed by multiple workers. In .NET 4.0, you would use the builtin BlockingCollection - it's the same thing.
That's basically what you want. Except don't call Thread.Create(). Use the thread pool.
cite: Advantage of using Thread.Start vs QueueUserWorkItem
// assume the SafeQueue class from the cited doc page.
SafeQueue<SpecialHardware> q = new SafeQueue<SpecialHardware>()
// set up the queue with objects protecting the 5 magic stones
private void Setup()
{
for (int i=0; i< 5; i++)
{
q.Enqueue(GetInstanceOfSpecialHardware(i));
}
}
// something like this gets called many times, by QueueUserWorkItem()
public void DoWork(WorkDescription d)
{
d.DoPrepWork();
// gain access to one of the special hardware devices
SpecialHardware shw = q.Dequeue();
try
{
shw.DoTheMagicThing();
}
finally
{
// ensure no matter what happens the HW device is released
q.Enqueue(shw);
// at this point another worker can use it.
}
d.DoFollowupWork();
}
A lock free solution is only beneficial if the computation time is very small.
I would create a facade for each hardware thread where jobs are enqueued and a callback is invoked each time a job finishes.
Something like:
public class Job
{
public string JobInfo {get;set;}
public Action<Job> Callback {get;set;}
}
public class MyHardwareService
{
Queue<Job> _jobs = new Queue<Job>();
Thread _hardwareThread;
ManualResetEvent _event = new ManualResetEvent(false);
public MyHardwareService()
{
_hardwareThread = new Thread(WorkerFunc);
}
public void Enqueue(Job job)
{
lock (_jobs)
_jobs.Enqueue(job);
_event.Set();
}
public void WorkerFunc()
{
while(true)
{
_event.Wait(Timeout.Infinite);
Job currentJob;
lock (_queue)
{
currentJob = jobs.Dequeue();
}
//invoke hardware here.
//trigger callback in a Thread Pool thread to be able
// to continue with the next job ASAP
ThreadPool.QueueUserWorkItem(() => job.Callback(job));
if (_queue.Count == 0)
_event.Reset();
}
}
}
Sounds like you need a thread pool with 5 threads where each one relinquishes the HW once it's done and adds it back to some queue. Would that work? If so, .Net makes thread pools very easy.
Sounds a lot like the Sleeping barber problem. I believe the standard solution to that is to use semaphores

.NET Web Service & BackgroundWorker threads

I'm trying to do some async stuff in a webservice method. Let say I have the following API call: http://www.example.com/api.asmx
and the method is called GetProducts().
I this GetProducts methods, I do some stuff (eg. get data from database) then just before i return the result, I want to do some async stuff (eg. send me an email).
So this is what I did.
[WebMethod(Description = "Bal blah blah.")]
public IList<Product> GetProducts()
{
// Blah blah blah ..
// Get data from DB .. hi DB!
// var myData = .......
// Moar clbuttic blahs :) (yes, google for clbuttic if you don't know what that is)
// Ok .. now send me an email for no particular reason, but to prove that async stuff works.
var myObject = new MyObject();
myObject.SendDataAsync();
// Ok, now return the result.
return myData;
}
}
public class TrackingCode
{
public void SendDataAsync()
{
var backgroundWorker = new BackgroundWorker();
backgroundWorker.DoWork += BackgroundWorker_DoWork;
backgroundWorker.RunWorkerAsync();
//System.Threading.Thread.Sleep(1000 * 20);
}
private void BackgroundWorker_DoWork(object sender, DoWorkEventArgs e)
{
SendEmail();
}
}
Now, when I run this code the email is never sent. If I uncomment out the Thread.Sleep .. then the email is sent.
So ... why is it that the background worker thread is torn down? is it dependant on the parent thread? Is this the wrong way I should be doing background or forked threading, in asp.net web apps?
BackgroundWorker is useful when you need to synchronize back to (for example) a UI* thread, eg for affinity reasons. In this case, it would seem that simply using ThreadPool would be more than adequate (and much simpler). If you have high volumes, then a producer/consumer queue may allow better throttling (so you don't drown in threads) - but I suspect ThreadPool will be fine here...
public void SendDataAsync()
{
ThreadPool.QueueUserWorkItem(delegate
{
SendEmail();
});
}
Also - I'm not quite sure what you want to achieve by sleeping? This will just tie up a thread (not using CPU, but doing no good either). Care to elaborate? It looks like you are pausing your actual web page (i.e. the Sleep happens on the web-page thread, not the e-mail thread). What are you trying to do here?
*=actually, it will use whatever sync-context is in place
Re producer/consumer; basically - it is just worth keeping some kind of throttle. At the simplest level, a Semaphore could be used (along with the regular ThreadPool) to limit things to a known amount of work (to avoid saturating the thread pool); but a producer/consumer queue would probably be more efficient and managed.
Jon Skeet has such a queue here (CustomThreadPool). I could probably write some notes about it if you wanted.
That said: if you are calling off to an external web-site, it is quite likely that you will have a lot of waits on network IO / completion ports; as such, you can have a slightly higher number of threads... obviously (by contrast) if the work was CPU-bound, there is no point having more threads than you have CPU cores.
It may torn down because after 20 seconds, that BackgroundWorker instance may be garbage collected because it has no references (gone out of scope).

Categories

Resources