Ordering of Asynchronous method calls - c#

I am working on a class library that logs audit details of a web application in several types of datasources(file, xml, database) based on policies defined in the web configuration file.
My Audit log method has a signature similar to this:
public static void LogInfo(User user, Module module, List lst);
Web application uses this method to log important pieces of details like warnings, error and even exception details.
Since in a single workflow, there are more than 700+ calls to these methods , I thought of making them asynchronous. I used simple method from ThreadPool class called QueueUserWorkItem
ThreadPool.QueueUserWorkItem(o => LogInfo(User user, Module module, List<Object> lst) );
but this does not ensure the order in which work item was queued to it. Even though all my information was logged but entire ordering was messed up. In my text file my logs were not in the order in which they were called.
Is there a way I can control the ordering of the threads being called using QueueUserWorkItem?

I don't think you can specify ordering when using QueueUserWorkItem.
To run the logging in parallel (on some background thread), you could use ConcurrentQueue<T>. This is a thread-safe collection that can be accessed from multiple threads. You could create one work item (or a thread) that reads elements from the collection and writes them to a file. Your main application would add items to the collection. The fact that you're adding items to the collection from a single thread should guarantee that they will be read in the right order.
To keep things simple, you can store Action values in the queue:
ConcurrentQueue<Action> logOperations = new ConcurrentQueue<Action>();
// To add logging operation from main thread:
logOperations.Add(() => LogInfo(user, module, lst));
The background task can just take Actions from the queue and run them:
// Start this when you create the `logOperations` collection
ThreadPool.QueueUserWorkItem(o => {
Action op;
// Repeatedly take log operations & run them
while (logOperations.TryDequeue(out op)) op();
});
If you need to stop the background processor (that writes data to the log), you can create a CancellationTokenSource and also end the while loop when the token is being cancelled (by the main thread). This cha be checked using IsCancellationRequested property (see MSDN)

One way of solving this would be to put your data in a queue, and then having a single task picking from that queue and writing them in order. If you are using .net 4.0 You could use ConcurrentQueue, which is thread safe, otherwise a simple Queue with proper locks would work as well.
The thread consuming the queue could then periodically check for any element inside the queue, and for each one of them it could log. This way the lengthy operation (logging) could be in its own thread, whereas in the main thread you do simply adds.

Related

Returning status from a worker class

I have a service class and a worker class. The worker class does all the processing.
class WorkerClass
{
public void ProcessWork(<params to the method>)
{
// Get the tasks from the DB.
// Call a 3rd party web service to process each of the tasks.
}
}
In my service class, I instantiate the worker class and call the method. The question is, how do I get the number of tasks processed in the service class?
I have thought of 3 options:
Expose an event from the worker class. Hook up an event handler in the service class.
Modify the signature of ProcessWork method so that it accepts a delegate:
public void ProcessWork(object obj1, Action<int, int> actionProgressTracker)
Expose a property from the worker class and get the property in the service class. Refresh the property every 30 seconds.
What would be a clean way of getting the status?
The first two options are really functionally identical. Both can work just fine for what you need to do. The second has an implication that the delegate is required, whereas the first implies that it is not. An event might also imply that it is used beyond the scope of just this one method.
As for the third option, it doesn't give the caller the opportunity to execute code when the number updates, it just gives them the opportunity to access the information.
So if the caller of this type is going to need to do something with this information *every time the value changes) then you should be using something comparable to one of the first two options so that the worker can "push" information to the caller.
If the caller wants to "pull" the information from the worker whenever it wants the information, then go with the third option.
Note that there is also a Progress class that you can use, with a corresponding IProgress interface, that's comparable to your first two options, but is specifically tailored for a worker updating a UI with progress.
Both push and pull methods can actually be sensible for updating a UI with progress of a backround task. If progress occurs infrequently it may make sense to update the UI every time progress changes, so the UI will want to be "notified" of when those updates happen. If the updates are very frequent, then the UI may want to instead have a timer and pull the current status every so often, to avoid taxing the UI with more updates than are needed or than it can handle.
Of course, if you're pushing information and not just something like a percent complete, then it may be important to not lose any of that information, in which case your 3rd approach isn't an option, as multiple updates may happen in between fetches.
And of course if you're writing a sufficiently generalized worker, you may want to expose both a push and pull mechanism, to let the caller choose the appropriate one.

C# pub/sub service - how to fire events on background threads?

I've developed some code that receives a series of values from a hardware device, every 50ms in the form of name/value pairs. I want to develop a pub/sub service whereby subscribers can be notified when the value of a particular item changes. The Subscribe method might look something like this:-
public void Subscribe(string itemName, Action<string, long> callback)
The code that reads the hardware values will check if a value has changed since last time. If so, it will iterate through any subscribers for that item, calling their delegates. As it stands, the delegates will be called on the same thread which isn't ideal - I need to keep the polling as fast as possible. What's the best approach for calling the callback delegates on separate threads? Should the subscribers pass in (say) a task/thread, or should the publisher be responsible for spinning these up?
Note that I need to pass a couple of parameters to the delegate (the item name and its value), so this might affect the approach taken. I know you can pass a single "state" object to tasks but it feels a bit unintuitive requiring the subscribers to implement an Action callback delegate (which must then be cast to some other type containing the name and value).
Also, I'm assuming that creating a new task/thread each time a delegate is called will hurt performance, so some kind of "pool" might be required?
I would maintain the same structure that you now have and put the responsibility of prompt action onto the callbacks, ie. the callbacks should not block or perform complex, lengthy actions directly.
If a particular callback needs to perform any lengthy action, it should queue off the Action data to a thread of its own and then return 'immediately', eg. it might BeginInvoke/PostMessage the data to a GUI thread, queue it to a thread that inserts into DB table or queue it to a logger, (or indeed, any combo chained together). These lengthy/blocking actions can then proceed in parallel while the device interface continues to poll.
This way, you keep the working structure you have and do not have to inflict any inter-thread comms onto callbacks that do not need it. The device interface remains encapsulated, just firing callbacks.
EDIT:
'creating a new task/thread each time a delegate is called will hurt performance' - yes, and also it would be difficult to maintain state. Often, such threads are written as while(true) loops with some signaling call at the top, eg. a blocking queue pop(), and so only need creating once, at startup, and never need terminating.

Multiple producers, single consumer

I have to develop a multithreaded application, where there will be multiple threads, each thread generates custom event log which need to be saved in queue (not Microsoft MSMQ).
There will be another thread which reads log data from queue and manipulates it, with certain information to save log information into a file. Basically here we are implementing Multiple-producer, Single-consumer paradigm.
Can anybody provide suggestions on how to implement this in C++ or C#.
Thanks,
This kind of thing is very easy to do using the BlockingCollection<T> defined in System.Collections.Concurrent.
Basically, you create your queue so that all threads can access it:
BlockingCollection<LogRecord> LogQueue = new BlockingCollection<LogRecord>();
Each producer adds items to the queue:
while (!Shutdown)
{
LogRecord rec = CreateLogRecord(); // however that's done
LogQueue.Add(rec);
}
And the consumer does something similar:
while (!Shutdown)
{
LogRecord rec = LogQueue.Take();
// process the record
}
By default, BlockingCollection uses a ConcurrentQueue<T> as the backing store. The ConcurrentQueue takes care of thread synchronization and, and the BlockingCollection does a non-busy wait when trying to take an item. That is, if the consumer calls Take when there are no items in the queue, it does a non-busy wait (no sleeping/spinning) until an item is available.
You can use a synchronized queue (if you have .NET 3.5 or older code) or even better the new ConcurrentQueue<T>!
What you are planning is a classic producer consumer queue with a thread consuming the items on the queue to do some work. This can be wrapped into is a higher level construct called an "actor" or "active object".
Basically this wraps the queue and the thread that consumes the items into a single class, the other threads all asynchronous methods on this class with put the messages on the queue to be performed by the actor's thread. In your case the class could have a single method writeData which stores the data in the queue and triggers the condition variable to notify the actor thread that there is something in the queue. The actor thread sees if there is any data in the queue if not waits on the condition variable.
Here is a good article on the concept:
http://www.drdobbs.com/go-parallel/article/showArticle.jhtml;jsessionid=UTEXJOTLP0YDNQE1GHPSKH4ATMY32JVN?articleID=225700095

Callbacks (Asynchronous Method Calls) within a Loop

Part of a C# application I'm writing requires collecting data from a service provider's database for each account associated to a user. When the user logs into the app a call is made to start updating the accounts from the service provider's database. Since lots operations are performed on the third party's end the process of getting their information could take a while so I don't want to wait for each account just to start the process of updating. My question is, is there any issues (maybe threading issues) with calling a asynchronous method inside of a loop?
The only loop-specific issue is that if you use anonymous methods that refer to loop variables, each time around the loop an instance of the anonymous method object will be created but they will all refer to the same loop variable, so they will see it change its value as the loop executes. So make a copy of the loop variable inside the loop.
foreach (var thing in collection)
{
var copy = thing;
Action a = () =>
{
// refer to copy, not thing
}
}
2017-04-25: By the way, this issue was solved by C# 5.0. foreach automatically performs the above transformation.
The loop is no problem, but starting (too) many threads might be. See if your requirements allow using a the ThreadPool.

Implementing multithreading in C# (code review)

Greetings.
I'm trying to implement some multithreaded code in an application. The purpose of this code is to validate items that the database gives it. Validation can take quite a while (a few hundred ms to a few seconds), so this process needs to be forked off into its own thread for each item.
The database may give it 20 or 30 items a second in the beginning, but that begins to decline rapidly, eventually reaching about 65K items over 24 hours, at which point the application exits.
I'd like it if anyone more knowledgeable could take a peek at my code and see if there's any obvious problems. No one I work with knows multithreading, so I'm really just on my own, on this one.
Here's the code. It's kinda long but should be pretty clear. Let me know if you have any feedback or advice. Thanks!
public class ItemValidationService
{
/// <summary>
/// The object to lock on in this class, for multithreading purposes.
/// </summary>
private static object locker = new object();
/// <summary>Items that have been validated.</summary>
private HashSet<int> validatedItems;
/// <summary>Items that are currently being validated.</summary>
private HashSet<int> validatingItems;
/// <summary>Remove an item from the index if its links are bad.</summary>
/// <param name="id">The ID of the item.</param>
public void ValidateItem(int id)
{
lock (locker)
{
if
(
!this.validatedItems.Contains(id) &&
!this.validatingItems.Contains(id)
){
ThreadPool.QueueUserWorkItem(sender =>
{
this.Validate(id);
});
}
}
} // method
private void Validate(int itemId)
{
lock (locker)
{
this.validatingItems.Add(itemId);
}
// *********************************************
// Time-consuming routine to validate an item...
// *********************************************
lock (locker)
{
this.validatingItems.Remove(itemId);
this.validatedItems.Add(itemId);
}
} // method
} // class
The thread pool is a convenient choice if you have light weight sporadic processing that isn't time sensitive. However, I recall reading on MSDN that it's not appropriate for large scale processing of this nature.
I used it for something quite similar to this and regret it. I took a worker-thread approach in subsequent apps and am much happier with the level of control I have.
My favorite pattern in the worker-thread model is to create a master thread which holds a queue of tasks items. Then fork a bunch of workers that pop items off that queue to process. I use a blocking queue so that when there are no items the process, the workers just block until something is pushed onto the queue. In this model, the master thread produces work items from some source (db, etc.) and the worker threads consume them.
I second the idea of using a blocking queue and worker threads. Here is a blocking queue implementation that I've used in the past with good results:
https://www.codeproject.com/Articles/8018/Bounded-Blocking-Queue-One-Lock
What's involved in your validation logic? If its mainly CPU bound then I would create no more than 1 worker thread per processor/core on the box. This will tell you the number of processors:
Environment.ProcessorCount
If your validation involves I/O such as File Access or database access then you could use a few more threads than the number of processors.
Be careful, QueueUserWorkItem might fail
There is a possible logic error in the code posted with the question, depending on where the item id in ValidateItem(int id) comes from. Why? Because although you correctly lock your validatingItems and validatedItems queues before queing a work item, you do not add the item to the validatingItems queue until the new thread spins up. That means there could be a time gap where another thread calls ValidateItem(id) with the same id (unless this is running on a single main thread).
I would add item to the validatingItems queue just before queuing the item, inside the lock.
Edit: also QueueUserWorkItem() returns a bool so you should use the return value to make sure the item was queued and THEN add it to the validatingItems queue.
ThreadPool may not be optimal for jamming so much at once into it. You may want to research the upper limits of its capabilities and/or roll your own.
Also, there is a race condition that exists in your code, if you expect no duplicate validations. The call to
this.validatingItems.Add(itemId);
needs to happen in the main thread (ValidateItem), not in the thread pool thread (Validate method). This call should occur a line before the queueing of the work item to the pool.
A worse bug is found by not checking the return of QueueUserWorkItem. Queueing can fail, and why it doesn't throw an exception is a mystery to us all. If it returns false, you need to remove the item that was added to the validatingItems list, and handle the error (throw exeception probably).
I would be concerned about performance here. You indicated that the database may give it 20-30 items per second and an item could take up to a few seconds to be validated. That could be quite a large number of threads -- using your metrics, worst case 60-90 threads! I think you need to reconsider the design here. Michael mentioned a nice pattern. The use of the queue really helps keep things under control and organized. A semaphore could also be employed to control number of threads created -- i.e. you could have a maximum number of threads allowed, but under smaller loads, you wouldn't necessarily have to create the maximum number if fewer ended up getting the job done -- i.e. your own pool size could be dynamic with a cap.
When using the thread-pool, I also find it more difficult to monitor the execution of threads from the pool in their performing the work. So, unless it's fire and forget, I am in favor of more controlled execution. I know you mentioned that your app exits after the 65K items are all completed. How are you monitoring you threads to determine if they have completed their work -- i.e. all queued workers are done. Are you monitoring the status of all items in the HashSets? I think by queuing your items up and having your own worker threads consume off that queue, you can gain more control. Albeit, this can come at the cost of more overhead in terms of signaling between threads to indicate when all items have been queued allowing them to exit.
You could also try using the CCR - Concurrency and Coordination Runtime. It's buried inside Microsoft Robotics Studio, but provides an excellent API for doing this sort of thing.
You'd just need to create a "Port" (essentially a queue), hook up a receiver (method that gets called when something is posted to it), and then post work items to it. The CCR handles the queue and the worker thread to run it on.
Here's a video on Channel9 about the CCR.
It's very high-performance and is even being used for non-Robotics stuff (Myspace.com uses it behind the scenese for their content-delivery network).
I would recommend looking into MSDN: Task Parallel Library - DataFlow. You can find examples of implementing Producer-Consumer in your case would be the database producing items to validate and the validation routine becomes the consumer.
Also recommend using ConcurrentDictionary<TKey, TValue> as a "Concurrent" hash set where you just populate the keys with no values :). You can potentially make your code lock-free.

Categories

Resources