I have a Camera class that produces very large images at a high FPS that require processing by a ImageProcessor class. I also have a WPF Control, my View, that displays this information. I need each of these components needs to run on it's own thread so it doesn't lock up the processing.
Method 1) Camera has an Action<Image> ImageCreated that ImageProcessor subscribes to. ImageProcessor has an Action<Image, Foo> ImageCreated that contains an altered Image and Foo results for the View to show.
Method 2) Camera has a threadsafe (using locks and monitors) ProducerConsumer to which it produces Images, and ImageProcessor waits and Consumes. Same story for the View.
Method 2 is nice because I can create and manage my own threads.
Method 1 is nice because I have have multiple ImageProcessors subscribed to the Camera class. But I'm not sure who's thread is doing the heavyweight work, or if Action is wasting time creating threads. Again these images come in many times per second.
I'm trying to get the images to my View as quickly as possible, without tying up processing or causing the View to lock up.
Thoughts?
Unless you do it yourself, using Method 1) does not introduce any multithreading. Invoking an action (unless you call BeginInvoke) does so synchronously, just like any normal method call.
I would advocate Method 2). There is no need to tie it to one single consumer. If you use this queue as a single point of contact between X cameras and Y processors, you've decoupled the cameras from the processors and could modify the value of X and Y independently.
EDIT
At the risk of being accused of blog spam here, I remembered that I wrote a component that's similar (if not an exact match) for what you're looking for awhile ago. See if this helps:
ProcessQueue
The gist of it is that you provide the queue with a delegate that can process a single item--in your case, Image--in the constructor, then call Start. As items are added to the queue using Enqueue, they're automatically dispatched to an appropriate thread and processed.
For example, if you wanted to have the image move Camera->Processor->Writer (and have a variable number of each), then I would do something like this:
ProcessQueue<Foo> processorQueue = new ProcessQueue<Foo>(f => WriteFoo(f));
ProcessQueue<Image> cameraQueue = new ProcessQueue<Image>(i => processorQueue.Enqueue(ProcessImage(i)));
You could vary the number of threads in cameraQueue (which controls the image processing) and processorQueue (which controls writing to disk) by using SetThreadCount.
Once you've done that, you would just call cameraQueue.Enqueue(image) whenever a camera captured an image.
Method one will not work - the Action<T> will executed on the thread that invoked it. Although you should probably use events instead of plain delegates in scenarios like this.
Method two is the way to go, but if possible you should use the new thread-safe collection of .NET 4.0 instead of doing the synchronization yourself - we all know how hard it is to get even the simplest multi-threaded code correct.
Related
I'm working on a C# app with a time-consuming sequential workflow that must be performed asynchronously. It starts when the user presses a button and the app receives a few images captured from a camera within just a few milliseconds. The work flow then.
Saves the images to disk
Aligns them.
Generates 3d data from them.
Groups them into a larger, collective object (called a "Scan").
Add optional analysis data to this scan and executes it.
Finally saves the scan itself is saved to an xml file alongside the images.
Some of these steps are optional and configurable.
Since the processing can take so long, there will often be a queue of "scans" awaiting processing So I need to present to a user a visual representation of the queue of captured scans, their current processing state (e.g. "Saving", "Analyzing", "Finished" etc.)
I've looked into using TPL DataFlow for this. But while the mesh is simple to create, I'm not getting just how I might monitor the status of what is going on so that I can update a user interface. Do I try to link custom action blocks that post back messages to the UI for that? Something else?
Is TPL Dataflow even the right tool for this job?
Reporting Overall Progress
When you consider that a TPL DataFlow graph has a beginning and end block and that you know how many items you posted into the graph, all you need do is track how many messages have reached the final block and compare it to the source count of messages that were posted into the head. This will allow you to report progress.
Now this works trivially if the blocks are 1:1 - that is, for any message in there is a single message out. If there is a one:many block, you will need to change your progress reporting accordingly.
Reporting Job Stage Progress
If you wish to present progress of a job as it travels throughout the graph, you will need to pass job details to each block, not just the data needed for the actual block. A job is a single task that must span all the steps 1-6 listed in your question.
So for example step 2 may require image data in order to perform alignment but it does not care about filenames; how many steps there are in the job or anything else job related. There is insufficient detail to know state about the current job or makes it difficult to lookup the original job based on the block input alone. You could refer to some external dictionary but graphs are best designed when they are isolated and deal only with data passed into each block.
So a simple example would be to change this minimal code from:
var alignmentBlock = new TransformBlock<Image, Image>(n => { ... });
...to:
var alignmentBlock = new TransformBlock<Job, Job>(x =>
{
job.Stage = Stages.Aligning;
// perform alignment here
job.Aligned = ImageAligner.Align (x.Image, ...);
// report progress
job.Stage = Stages.AlignmentComplete;
});
...and repeat the process for the other blocks.
The stage property could fire a PropertyChanged notification or use any other form of notification pattern suitable for your UI.
Notes
Now you will notice that I introduce a Job class that is passed as the only argument to each block. Job contains input data for the block as well as being a container for block output.
Now this will work, but the purist in me feels that it would be better to perhaps keep job metadata separate what is TPL block input and output otherwise there is potential state damage from multiple threads.
To get around this you may want to consider using Tuple<> and passing that into the block.
e.g.
var alignmentBlock = new TransformBlock<Tuple<Job, UnalignedImages>,
Tuple<Job, AlignedImages>>(n => { ... });
What I'm trying to accomplish is I have a action block with MaxDegreeOfParallelism = 4. I want to create one local instance of a session object I have for each parallel path, So I want to total of 4 session objects. If this was threads I would creating something like:
ThreadLocal<Session> sessionPerThread = new ThreadLocal<Session>(() => new Session());
I know blocks are not threads so I'm looking for something similar but for blocks. Any way to create this?
This block is in a service and runs for months on end. During that time period tons of threads are used for each concurrent slot of the block so thread local storage is not appropriate. I need something tied to the logical block slot. Also this block never completes, it runs the entire lifetime of the service.
Note: The above suggested answer is not valid for what I am asking. I'm specifically asking for something different than thread local and the above answer is using thread local. This is a different question entirely.
As it sounds like you already know, Dataflow blocks provide absolutely no guarantee of correlation between blocks, execution, and threads. Even with max parallelism set to 4, all 4 tasks could be executing on the same thread. Or an individual task may execute on many threads.
Given that you ultimately want to reuse n instances of an expensive service for your n degrees of parallelism, let's take dataflow completely out of the picture for a minute, since it doesn't help (or directly hinder) you from any general solution to this problem. It's actually quite simple. You can use a ConcurrentStack<T>, where T is the type of your service that is expensive to instantiate. You have code that appears at the top of the method (or delegate) that represents one of your parallel units of work:
private ConcurrentStack<T> reusableServices;
private void DoWork() {
T service;
if (!this.reusableServices.TryPop(out service)) {
service = new T(); // expensive construction
}
// Use your shared service.
//// Code here.
// Put the service back when we're done with it so someone else can use it.
this.reusableServices.Push(service);
}
Now in this way, you can quickly see that you create exactly as many instances of your expensive service as you have parallel executions of DoWork(). You don't even have to hard-code the degree of parallelism you expect. And it's orthogonal to how you actually schedule that parallelism (so threadpool, Dataflow, PLINQ, etc. doesn't matter).
So you can just use DoWork() as your Dataflow block's delegate and you're set to go.
Of course, there's nothing magical about ConcurrentStack<T> here, except that the locks around push and pop are built into the type so you don't have to do it yourself.
I've developed some code that receives a series of values from a hardware device, every 50ms in the form of name/value pairs. I want to develop a pub/sub service whereby subscribers can be notified when the value of a particular item changes. The Subscribe method might look something like this:-
public void Subscribe(string itemName, Action<string, long> callback)
The code that reads the hardware values will check if a value has changed since last time. If so, it will iterate through any subscribers for that item, calling their delegates. As it stands, the delegates will be called on the same thread which isn't ideal - I need to keep the polling as fast as possible. What's the best approach for calling the callback delegates on separate threads? Should the subscribers pass in (say) a task/thread, or should the publisher be responsible for spinning these up?
Note that I need to pass a couple of parameters to the delegate (the item name and its value), so this might affect the approach taken. I know you can pass a single "state" object to tasks but it feels a bit unintuitive requiring the subscribers to implement an Action callback delegate (which must then be cast to some other type containing the name and value).
Also, I'm assuming that creating a new task/thread each time a delegate is called will hurt performance, so some kind of "pool" might be required?
I would maintain the same structure that you now have and put the responsibility of prompt action onto the callbacks, ie. the callbacks should not block or perform complex, lengthy actions directly.
If a particular callback needs to perform any lengthy action, it should queue off the Action data to a thread of its own and then return 'immediately', eg. it might BeginInvoke/PostMessage the data to a GUI thread, queue it to a thread that inserts into DB table or queue it to a logger, (or indeed, any combo chained together). These lengthy/blocking actions can then proceed in parallel while the device interface continues to poll.
This way, you keep the working structure you have and do not have to inflict any inter-thread comms onto callbacks that do not need it. The device interface remains encapsulated, just firing callbacks.
EDIT:
'creating a new task/thread each time a delegate is called will hurt performance' - yes, and also it would be difficult to maintain state. Often, such threads are written as while(true) loops with some signaling call at the top, eg. a blocking queue pop(), and so only need creating once, at startup, and never need terminating.
Say I'm writing a piece of software that simulates a user performaning certain actions on a system. I'm measuring the amount of time it takes for such an action to complete using a stopwatch.
Most of the times this is pretty straighforward: the click of a button is simulated, some service call is associated with this button. The time it takes for this service call to complete is measured.
Now comes the crux, some actions have more than one service call associated with them. Since they're all still part of the same logical action, I'm 'grouping' these using the signalling mechanism offered by C#, like so (pseudo):
var syncResultList = new List<WaitHandle>();
var syncResultOne = service.BeginGetStuff();
var syncResultTwo = service.BeginDoOtherStuff();
syncResultList.Add(syncResultOne.AsyncWaitHandle);
syncResultList.Add(syncResultTwo.AsyncWaitHandle);
WaitHandle.WaitAll(syncResultList.ToArray());
var retValOne = service.EndGetStuff(syncResultOne);
var retValTwo = service.EndDoOtherStuff(syncResultTwo);
So, GetStuff and DoOtherStuff constitute one logical piece of work for that particular action. And, ofcourse, I can easily measure the amount of time it takes for this conjuction of methods to complete, by just placing a stopwatch around them. But, I need a more fine-grained approach for my statistics. I'm really interested in the amount of time it takes for each of the methods to complete, without losing the 'grouped' semantics provided by WaitHandle.WaitAll.
What I've done to overcome this, was writing a wrapper class (or rather a code generation file), which implements some timing mechanism using a callback, since I'm not that interested in the actual result (save exceptions, which are part of the statistic), I'd just let that return some statistic. But this turned out to be a performance drain somehow.
So, basically, I'm looking for an alternative to this approach. Maybe it's much simpler than I'm thinking right now, but I can't seem to figure it out by myself at the moment.
This looks like a prime candidate for Tasks ( assuming you're using C# 4 )
You can create Tasks from your APM methods using MSDN: Task.Factory.FromAsync
You can then use all the rich TPL goodness like individual continuations.
If your needs are simple enough, a simple way would be to just record each service call individually, then calculate the logical action based off the individual service calls.
IE if logical action A is made of parallel service calls B and C where B took 2 seconds and C took 1 second, then A takes 2 seconds.
A = Max(B, C)
My problem is this:
I have two threads, my UI thread, and a worker thread. My worker thread is running in a seperate class that gets instantiated by the form, which passes itself as an ISynchronizeInvoke to the worker class, which then uses Invoke on that interface to call it's events, which provide status updates to the UI for display. This works wonderfully.
I noticed that my background thread seemed to be running slowly though, so I changed the call to Invoke to BeginInvoke, thinking that "I'm just providing progress updates, it doesn't need to be exactly synchronous, no harm done" except that now I'm getting oddities with the progress update. My progress bar updates, but the label's text doesn't, and if I change to another window and try to change back, it acts like the UI thread is locked up, so I'm wondering if perhaps my progress calls (which happen very often) are overloading the UI thread so much that it never processes messages. Is this possible at all, or is there something else at work here?
You're definitively overloading the UI thread.
In your first sample, you were (behind the scenes) sending a message to the UI thread, waiting for it to be processed (that's the purpose of invoke, which ultimately relies on SendMessage), and then sending another one. In the meantime, other messages were probably enqueued (WM_PAINT messages, for example) and processed.
In your second sample, by using BeginInvoke (which ultimately relies on PostMessage), you massively enqueued a lot of messages in the message queue, that the message pump must sequentially handle. And of course, while it's handling those thousands of messages, it cannot handle the OS messages (WM_PAINT, etc..) which makes your UI look "frozen"
You're probably providing too much status updates ; try to lower the feedback level.
If you want to understand better how messages work in windows, this is the place to start.
A few thoughts;
try batching your updates; for example, there is no point updating for every iteration in a loop; depending on the speed, perhaps every 50 / 500. In the case of lists, you would buffer in a local list variable, take the list over via Invoke / BeginInvoke, and process the buffer on the UI thread
variable capture; if you are using BeginInvoke and anonymous methods, you could have problems... I'll add an example below
making the UI update efficient - especially if you are processing a list; some controls (especially list-based controls) have a pair of methods like BeginEdit / EndEdit, that stop the UI redrawing when you are making lots of updates; instead, it waits until the End* is called
capture problem... imagine (worker):
List<string> stuff = new List<string>();
for(int i = 0 ; i < 50000 ; i++) {
stuff.Add(i.ToString());
if((i % 100) == 0) {
// update UI
BeginInvoke((MethodInvoker) delegate {
foreach(string s in stuff) {
listBox.Items.Add(s);
}
});
}
}
Did you notice that at some point both threads are talking to stuff? The UI thread can be iterating it while the worker thread (which has kept running past BeginInvoke) keeps adding. This can cause issues. Not usually performance issues (unless you are catching the exceptions and taking a long time to log them), but definitely issues. Options here would include:
using Invoke to run the update synchronously
create a new buffer per update, so that the two threads never have the same list instance (you'd need to look very carefully at the variable scoped to make sure, though)