I've developed some code that receives a series of values from a hardware device, every 50ms in the form of name/value pairs. I want to develop a pub/sub service whereby subscribers can be notified when the value of a particular item changes. The Subscribe method might look something like this:-
public void Subscribe(string itemName, Action<string, long> callback)
The code that reads the hardware values will check if a value has changed since last time. If so, it will iterate through any subscribers for that item, calling their delegates. As it stands, the delegates will be called on the same thread which isn't ideal - I need to keep the polling as fast as possible. What's the best approach for calling the callback delegates on separate threads? Should the subscribers pass in (say) a task/thread, or should the publisher be responsible for spinning these up?
Note that I need to pass a couple of parameters to the delegate (the item name and its value), so this might affect the approach taken. I know you can pass a single "state" object to tasks but it feels a bit unintuitive requiring the subscribers to implement an Action callback delegate (which must then be cast to some other type containing the name and value).
Also, I'm assuming that creating a new task/thread each time a delegate is called will hurt performance, so some kind of "pool" might be required?
I would maintain the same structure that you now have and put the responsibility of prompt action onto the callbacks, ie. the callbacks should not block or perform complex, lengthy actions directly.
If a particular callback needs to perform any lengthy action, it should queue off the Action data to a thread of its own and then return 'immediately', eg. it might BeginInvoke/PostMessage the data to a GUI thread, queue it to a thread that inserts into DB table or queue it to a logger, (or indeed, any combo chained together). These lengthy/blocking actions can then proceed in parallel while the device interface continues to poll.
This way, you keep the working structure you have and do not have to inflict any inter-thread comms onto callbacks that do not need it. The device interface remains encapsulated, just firing callbacks.
EDIT:
'creating a new task/thread each time a delegate is called will hurt performance' - yes, and also it would be difficult to maintain state. Often, such threads are written as while(true) loops with some signaling call at the top, eg. a blocking queue pop(), and so only need creating once, at startup, and never need terminating.
Related
I have a service class and a worker class. The worker class does all the processing.
class WorkerClass
{
public void ProcessWork(<params to the method>)
{
// Get the tasks from the DB.
// Call a 3rd party web service to process each of the tasks.
}
}
In my service class, I instantiate the worker class and call the method. The question is, how do I get the number of tasks processed in the service class?
I have thought of 3 options:
Expose an event from the worker class. Hook up an event handler in the service class.
Modify the signature of ProcessWork method so that it accepts a delegate:
public void ProcessWork(object obj1, Action<int, int> actionProgressTracker)
Expose a property from the worker class and get the property in the service class. Refresh the property every 30 seconds.
What would be a clean way of getting the status?
The first two options are really functionally identical. Both can work just fine for what you need to do. The second has an implication that the delegate is required, whereas the first implies that it is not. An event might also imply that it is used beyond the scope of just this one method.
As for the third option, it doesn't give the caller the opportunity to execute code when the number updates, it just gives them the opportunity to access the information.
So if the caller of this type is going to need to do something with this information *every time the value changes) then you should be using something comparable to one of the first two options so that the worker can "push" information to the caller.
If the caller wants to "pull" the information from the worker whenever it wants the information, then go with the third option.
Note that there is also a Progress class that you can use, with a corresponding IProgress interface, that's comparable to your first two options, but is specifically tailored for a worker updating a UI with progress.
Both push and pull methods can actually be sensible for updating a UI with progress of a backround task. If progress occurs infrequently it may make sense to update the UI every time progress changes, so the UI will want to be "notified" of when those updates happen. If the updates are very frequent, then the UI may want to instead have a timer and pull the current status every so often, to avoid taxing the UI with more updates than are needed or than it can handle.
Of course, if you're pushing information and not just something like a percent complete, then it may be important to not lose any of that information, in which case your 3rd approach isn't an option, as multiple updates may happen in between fetches.
And of course if you're writing a sufficiently generalized worker, you may want to expose both a push and pull mechanism, to let the caller choose the appropriate one.
Got a quick question on event ordering in C#/.NET.
Let's say you have a while loop that reads a socket interface (TCP). The interface is definitely taking care of the ordering (it's TCP). Let's say your packet interface is written so that each "packet" you get in the stream, you will forward it to the next "layer" or the next object via an event callback.
So here is the pseudocode:
while (1) {
readsocket();
if (data received = complete packet)
raiseEvent(packet);
}
My questions are:
Are the events generated in order? (i.e. preserve ordering)
I am assuming #1 is correct, so that means it will block the while loop until the event finishes processing?
You never know how the event is implemented. It's possible that the events will all be executed synchronously, in order, and based on some meaningful value. It's also possible that they'll be executed synchronously in some arbitrary and inconsistent ordering. It's also possible that they won't even be executed synchronously, and that the various event handlers will be executed in new threads (or thread pool threads). It's entirely up to the implementation of the event to determine all of that.
It's rather uncommon to see different event handlers executed in parallel (and by that I mean very, very very rare), and almost all events that you come across will be backed by a single multicast delegate, meaning the order they will be fired in is the order in which they were added, but you have no way of actually knowing if that's the case (baring decompiling the code). There is no indication from the public API if that is how it is implemented.
Regardless of all of this, from a conceptual perspective, it would be best to not rely on any ordering of event handler invocations, and it's generally best to program as if the various event handlers could be run concurrently because at a conceptual level, that is what an event represents even if the implementation details are more restrictive.
I have a Camera class that produces very large images at a high FPS that require processing by a ImageProcessor class. I also have a WPF Control, my View, that displays this information. I need each of these components needs to run on it's own thread so it doesn't lock up the processing.
Method 1) Camera has an Action<Image> ImageCreated that ImageProcessor subscribes to. ImageProcessor has an Action<Image, Foo> ImageCreated that contains an altered Image and Foo results for the View to show.
Method 2) Camera has a threadsafe (using locks and monitors) ProducerConsumer to which it produces Images, and ImageProcessor waits and Consumes. Same story for the View.
Method 2 is nice because I can create and manage my own threads.
Method 1 is nice because I have have multiple ImageProcessors subscribed to the Camera class. But I'm not sure who's thread is doing the heavyweight work, or if Action is wasting time creating threads. Again these images come in many times per second.
I'm trying to get the images to my View as quickly as possible, without tying up processing or causing the View to lock up.
Thoughts?
Unless you do it yourself, using Method 1) does not introduce any multithreading. Invoking an action (unless you call BeginInvoke) does so synchronously, just like any normal method call.
I would advocate Method 2). There is no need to tie it to one single consumer. If you use this queue as a single point of contact between X cameras and Y processors, you've decoupled the cameras from the processors and could modify the value of X and Y independently.
EDIT
At the risk of being accused of blog spam here, I remembered that I wrote a component that's similar (if not an exact match) for what you're looking for awhile ago. See if this helps:
ProcessQueue
The gist of it is that you provide the queue with a delegate that can process a single item--in your case, Image--in the constructor, then call Start. As items are added to the queue using Enqueue, they're automatically dispatched to an appropriate thread and processed.
For example, if you wanted to have the image move Camera->Processor->Writer (and have a variable number of each), then I would do something like this:
ProcessQueue<Foo> processorQueue = new ProcessQueue<Foo>(f => WriteFoo(f));
ProcessQueue<Image> cameraQueue = new ProcessQueue<Image>(i => processorQueue.Enqueue(ProcessImage(i)));
You could vary the number of threads in cameraQueue (which controls the image processing) and processorQueue (which controls writing to disk) by using SetThreadCount.
Once you've done that, you would just call cameraQueue.Enqueue(image) whenever a camera captured an image.
Method one will not work - the Action<T> will executed on the thread that invoked it. Although you should probably use events instead of plain delegates in scenarios like this.
Method two is the way to go, but if possible you should use the new thread-safe collection of .NET 4.0 instead of doing the synchronization yourself - we all know how hard it is to get even the simplest multi-threaded code correct.
My problem is this:
I have two threads, my UI thread, and a worker thread. My worker thread is running in a seperate class that gets instantiated by the form, which passes itself as an ISynchronizeInvoke to the worker class, which then uses Invoke on that interface to call it's events, which provide status updates to the UI for display. This works wonderfully.
I noticed that my background thread seemed to be running slowly though, so I changed the call to Invoke to BeginInvoke, thinking that "I'm just providing progress updates, it doesn't need to be exactly synchronous, no harm done" except that now I'm getting oddities with the progress update. My progress bar updates, but the label's text doesn't, and if I change to another window and try to change back, it acts like the UI thread is locked up, so I'm wondering if perhaps my progress calls (which happen very often) are overloading the UI thread so much that it never processes messages. Is this possible at all, or is there something else at work here?
You're definitively overloading the UI thread.
In your first sample, you were (behind the scenes) sending a message to the UI thread, waiting for it to be processed (that's the purpose of invoke, which ultimately relies on SendMessage), and then sending another one. In the meantime, other messages were probably enqueued (WM_PAINT messages, for example) and processed.
In your second sample, by using BeginInvoke (which ultimately relies on PostMessage), you massively enqueued a lot of messages in the message queue, that the message pump must sequentially handle. And of course, while it's handling those thousands of messages, it cannot handle the OS messages (WM_PAINT, etc..) which makes your UI look "frozen"
You're probably providing too much status updates ; try to lower the feedback level.
If you want to understand better how messages work in windows, this is the place to start.
A few thoughts;
try batching your updates; for example, there is no point updating for every iteration in a loop; depending on the speed, perhaps every 50 / 500. In the case of lists, you would buffer in a local list variable, take the list over via Invoke / BeginInvoke, and process the buffer on the UI thread
variable capture; if you are using BeginInvoke and anonymous methods, you could have problems... I'll add an example below
making the UI update efficient - especially if you are processing a list; some controls (especially list-based controls) have a pair of methods like BeginEdit / EndEdit, that stop the UI redrawing when you are making lots of updates; instead, it waits until the End* is called
capture problem... imagine (worker):
List<string> stuff = new List<string>();
for(int i = 0 ; i < 50000 ; i++) {
stuff.Add(i.ToString());
if((i % 100) == 0) {
// update UI
BeginInvoke((MethodInvoker) delegate {
foreach(string s in stuff) {
listBox.Items.Add(s);
}
});
}
}
Did you notice that at some point both threads are talking to stuff? The UI thread can be iterating it while the worker thread (which has kept running past BeginInvoke) keeps adding. This can cause issues. Not usually performance issues (unless you are catching the exceptions and taking a long time to log them), but definitely issues. Options here would include:
using Invoke to run the update synchronously
create a new buffer per update, so that the two threads never have the same list instance (you'd need to look very carefully at the variable scoped to make sure, though)
Greetings.
I'm trying to implement some multithreaded code in an application. The purpose of this code is to validate items that the database gives it. Validation can take quite a while (a few hundred ms to a few seconds), so this process needs to be forked off into its own thread for each item.
The database may give it 20 or 30 items a second in the beginning, but that begins to decline rapidly, eventually reaching about 65K items over 24 hours, at which point the application exits.
I'd like it if anyone more knowledgeable could take a peek at my code and see if there's any obvious problems. No one I work with knows multithreading, so I'm really just on my own, on this one.
Here's the code. It's kinda long but should be pretty clear. Let me know if you have any feedback or advice. Thanks!
public class ItemValidationService
{
/// <summary>
/// The object to lock on in this class, for multithreading purposes.
/// </summary>
private static object locker = new object();
/// <summary>Items that have been validated.</summary>
private HashSet<int> validatedItems;
/// <summary>Items that are currently being validated.</summary>
private HashSet<int> validatingItems;
/// <summary>Remove an item from the index if its links are bad.</summary>
/// <param name="id">The ID of the item.</param>
public void ValidateItem(int id)
{
lock (locker)
{
if
(
!this.validatedItems.Contains(id) &&
!this.validatingItems.Contains(id)
){
ThreadPool.QueueUserWorkItem(sender =>
{
this.Validate(id);
});
}
}
} // method
private void Validate(int itemId)
{
lock (locker)
{
this.validatingItems.Add(itemId);
}
// *********************************************
// Time-consuming routine to validate an item...
// *********************************************
lock (locker)
{
this.validatingItems.Remove(itemId);
this.validatedItems.Add(itemId);
}
} // method
} // class
The thread pool is a convenient choice if you have light weight sporadic processing that isn't time sensitive. However, I recall reading on MSDN that it's not appropriate for large scale processing of this nature.
I used it for something quite similar to this and regret it. I took a worker-thread approach in subsequent apps and am much happier with the level of control I have.
My favorite pattern in the worker-thread model is to create a master thread which holds a queue of tasks items. Then fork a bunch of workers that pop items off that queue to process. I use a blocking queue so that when there are no items the process, the workers just block until something is pushed onto the queue. In this model, the master thread produces work items from some source (db, etc.) and the worker threads consume them.
I second the idea of using a blocking queue and worker threads. Here is a blocking queue implementation that I've used in the past with good results:
https://www.codeproject.com/Articles/8018/Bounded-Blocking-Queue-One-Lock
What's involved in your validation logic? If its mainly CPU bound then I would create no more than 1 worker thread per processor/core on the box. This will tell you the number of processors:
Environment.ProcessorCount
If your validation involves I/O such as File Access or database access then you could use a few more threads than the number of processors.
Be careful, QueueUserWorkItem might fail
There is a possible logic error in the code posted with the question, depending on where the item id in ValidateItem(int id) comes from. Why? Because although you correctly lock your validatingItems and validatedItems queues before queing a work item, you do not add the item to the validatingItems queue until the new thread spins up. That means there could be a time gap where another thread calls ValidateItem(id) with the same id (unless this is running on a single main thread).
I would add item to the validatingItems queue just before queuing the item, inside the lock.
Edit: also QueueUserWorkItem() returns a bool so you should use the return value to make sure the item was queued and THEN add it to the validatingItems queue.
ThreadPool may not be optimal for jamming so much at once into it. You may want to research the upper limits of its capabilities and/or roll your own.
Also, there is a race condition that exists in your code, if you expect no duplicate validations. The call to
this.validatingItems.Add(itemId);
needs to happen in the main thread (ValidateItem), not in the thread pool thread (Validate method). This call should occur a line before the queueing of the work item to the pool.
A worse bug is found by not checking the return of QueueUserWorkItem. Queueing can fail, and why it doesn't throw an exception is a mystery to us all. If it returns false, you need to remove the item that was added to the validatingItems list, and handle the error (throw exeception probably).
I would be concerned about performance here. You indicated that the database may give it 20-30 items per second and an item could take up to a few seconds to be validated. That could be quite a large number of threads -- using your metrics, worst case 60-90 threads! I think you need to reconsider the design here. Michael mentioned a nice pattern. The use of the queue really helps keep things under control and organized. A semaphore could also be employed to control number of threads created -- i.e. you could have a maximum number of threads allowed, but under smaller loads, you wouldn't necessarily have to create the maximum number if fewer ended up getting the job done -- i.e. your own pool size could be dynamic with a cap.
When using the thread-pool, I also find it more difficult to monitor the execution of threads from the pool in their performing the work. So, unless it's fire and forget, I am in favor of more controlled execution. I know you mentioned that your app exits after the 65K items are all completed. How are you monitoring you threads to determine if they have completed their work -- i.e. all queued workers are done. Are you monitoring the status of all items in the HashSets? I think by queuing your items up and having your own worker threads consume off that queue, you can gain more control. Albeit, this can come at the cost of more overhead in terms of signaling between threads to indicate when all items have been queued allowing them to exit.
You could also try using the CCR - Concurrency and Coordination Runtime. It's buried inside Microsoft Robotics Studio, but provides an excellent API for doing this sort of thing.
You'd just need to create a "Port" (essentially a queue), hook up a receiver (method that gets called when something is posted to it), and then post work items to it. The CCR handles the queue and the worker thread to run it on.
Here's a video on Channel9 about the CCR.
It's very high-performance and is even being used for non-Robotics stuff (Myspace.com uses it behind the scenese for their content-delivery network).
I would recommend looking into MSDN: Task Parallel Library - DataFlow. You can find examples of implementing Producer-Consumer in your case would be the database producing items to validate and the validation routine becomes the consumer.
Also recommend using ConcurrentDictionary<TKey, TValue> as a "Concurrent" hash set where you just populate the keys with no values :). You can potentially make your code lock-free.