How do I monitor progress in a TPL Dataflow mesh?

How do I monitor progress in a TPL Dataflow mesh? - c#

I'm working on a C# app with a time-consuming sequential workflow that must be performed asynchronously. It starts when the user presses a button and the app receives a few images captured from a camera within just a few milliseconds. The work flow then.
Saves the images to disk
Aligns them.
Generates 3d data from them.
Groups them into a larger, collective object (called a "Scan").
Add optional analysis data to this scan and executes it.
Finally saves the scan itself is saved to an xml file alongside the images.
Some of these steps are optional and configurable.
Since the processing can take so long, there will often be a queue of "scans" awaiting processing So I need to present to a user a visual representation of the queue of captured scans, their current processing state (e.g. "Saving", "Analyzing", "Finished" etc.)
I've looked into using TPL DataFlow for this. But while the mesh is simple to create, I'm not getting just how I might monitor the status of what is going on so that I can update a user interface. Do I try to link custom action blocks that post back messages to the UI for that? Something else?
Is TPL Dataflow even the right tool for this job?

Reporting Overall Progress
When you consider that a TPL DataFlow graph has a beginning and end block and that you know how many items you posted into the graph, all you need do is track how many messages have reached the final block and compare it to the source count of messages that were posted into the head. This will allow you to report progress.
Now this works trivially if the blocks are 1:1 - that is, for any message in there is a single message out. If there is a one:many block, you will need to change your progress reporting accordingly.
Reporting Job Stage Progress
If you wish to present progress of a job as it travels throughout the graph, you will need to pass job details to each block, not just the data needed for the actual block. A job is a single task that must span all the steps 1-6 listed in your question.
So for example step 2 may require image data in order to perform alignment but it does not care about filenames; how many steps there are in the job or anything else job related. There is insufficient detail to know state about the current job or makes it difficult to lookup the original job based on the block input alone. You could refer to some external dictionary but graphs are best designed when they are isolated and deal only with data passed into each block.
So a simple example would be to change this minimal code from:
var alignmentBlock = new TransformBlock<Image, Image>(n => { ... });
...to:
var alignmentBlock = new TransformBlock<Job, Job>(x =>
{
job.Stage = Stages.Aligning;
// perform alignment here
job.Aligned = ImageAligner.Align (x.Image, ...);
// report progress
job.Stage = Stages.AlignmentComplete;
});
...and repeat the process for the other blocks.
The stage property could fire a PropertyChanged notification or use any other form of notification pattern suitable for your UI.
Notes
Now you will notice that I introduce a Job class that is passed as the only argument to each block. Job contains input data for the block as well as being a container for block output.
Now this will work, but the purist in me feels that it would be better to perhaps keep job metadata separate what is TPL block input and output otherwise there is potential state damage from multiple threads.
To get around this you may want to consider using Tuple<> and passing that into the block.
e.g.
var alignmentBlock = new TransformBlock<Tuple<Job, UnalignedImages>,
Tuple<Job, AlignedImages>>(n => { ... });

Related

Reusing a TransformBlock to enforce max download connections

I am building a component that downloads information from given urls and parses it to my business classes.
This has to happen in two stages. The pages that are being downloaded contain URLs to a set of further pages which are downloaded in a second stage.
I want all of this to be as parallel as possible and am trying to reduce the overall complexity by using the TPL Dataflow framework.
This is my (simplified) setup:
I post URLs to the buffer block which moves them to the download block.
In the download block the HTML is downloaded.
The download block has a conditional link to both parse blocks, so the html of Page Type A is moved to "Parse Page A", which is a TransformManyBlock.
Parse Page A generates a set of URLs to pages of type B.
Those are posted to the Download Block again.
Finally the conditional link posts the HTML of Page type B to the last block.
I am reusing the Download Block because I want to limit the number of max connections to the server this way by setting MaxDegreeOfParallelization.
The setup would be a lot easier if I simply could use two separate download blocks, but then I would be unable to limit the number of connections this way and still have as many parallel connections as possible.
Now my problem with this setup:
How can I propagate the Completion correctly? I call Complete() on the Buffer Block when I am done posting all URLs. But I cannot propagate this to the download block directly, because it might still be needed for the URLs produced from "Parse Page A" block, even after the buffer block has posted all URLs to it.
But I also couple the Download Block Completion to both Buffer Block and Parse Page A Block Completion, because then Parse Page A will never become complete.
I also thought about calling Complete() of "Parse Page A" when the Buffer Block is done, but then there might still be data in the download block which will get rejected by "Parse Block A".
Is there a way out of this circular dilemma?
Or am I on the wrong track completely and should do it in some other fashion?

You logically have a linear pipeline, so I think that's how you should model it in code too. This means having a separate download block for each type of pages. This way, completion will work fine, but you'll have to deal with connection limiting separately.
There are two ways I can see how to solve that:
If you're always connecting to the same server, you can limit the number of connections to it by using ServicePoints. You can either set that limit globally at the start of the program:
ServicePointManager.DefaultConnectionLimit = limit;
or just for the one server:
ServicePointManager.FindServicePoint(new Uri("http://myserver.com"))
.ConnectionLimit = limit;
If using ServicePoints won't work for you (because you don't have just one server, because it affects the whole application, …), you can limit the requests manually using something like SemaphoreSlim. The semaphore would be set to your desired limit and it would be shared between the two download blocks.
MaxDegreeOfParallelism for each block would be set to the same limit (higher value won't add anything, lower value could be inefficient) and their code could look like this:
try
{
await semaphore.WaitAsync();
// perform the download
}
finally
{
semaphore.Release();
}
If you do need this kind of limiting often, you could create a helper class that encapsulates this logic. Its usage could look like this:
var factory = new SharedLimitBlockFactory<Input, Output>(
limit, input => Download(input));
var downloadBlock1 = factory.CreateBlock();
var downloadBlock2 = factory.CreateBlock();

Avoiding BinaryReader.ReadString() in C#?

Good morning,
At the startup of the application I am writing I need to read about 1,600,000 entries from a file to a Dictionary<Tuple<String, String>, Int32>. It is taking about 4-5 seconds to build the whole structure using a BinaryReader (using a FileReader takes about the same time). I profiled the code and found that the function doing the most work in this process is BinaryReader.ReadString(). Although this process needs to be run only once and at startup, I would like to make it as quick as possible. Is there any way I can avoid BinaryReader.ReadString() and make this process faster?
Thank you very much.

Are you sure that you absolutely have to do this before continuing?
I would examine the possibility of hiving off the task to a separate thread which sets a flag when finished. Then your startup code simply kicks off that thread and continues on its merry way, pausing only when both:
the flag is not yet set; and
no more work can be done without the data.
Often, the illusion of speed is good enough, as anyone who has coded up a splash screen will tell you.
Another possibility, if you control the data, is to store it in a more binary form so you can just blat it all in with one hit (i.e., no interpretation of the data, just read in the whole thing). That, of course, makes it harder to edit the data from outside your application but you haven't stated that as a requirement.
If it is a requirement or you don't control the data, I'd still look into my first suggestion above.

If you think that reading the file line by line is the bottleneck, and depending on its size, you can try to read it all at once:
// read the entire file at once
string entireFile = System.IO.File.ReadAllText(path);
It this doesn't help, you can try to add a separate thread with a semaphore, which would start reading in background immediately when the program is started, but block the requesting thread at the moment you try to access the data.
This is called a Future, and you have an implementation in Jon Skeet's miscutil library.
You call it like this at the app startup:
// following line invokes "DoTheActualWork" method on a background thread.
// DoTheActualWork returns an instance of MyData when it's done
Future<MyData> calculation = new Future<MyData>(() => DoTheActualWork(path));
And then, some time later, you can access the value in the main thread:
// following line blocks the calling thread until
// the background thread completes
MyData result = calculation.Value;
If you look at the Future's Value property, you can see that it blocks at the AsyncWaitHandle if the thread is still running:
public TResult Value
{
get
{
if (!IsCompleted)
{
_asyncResult.AsyncWaitHandle.WaitOne();
_lock.WaitOne();
}
return _value;
}
}

If strings are repeated inside tuples you could reorganize your file to have all different involving strings at the start, and have references to those strings (integers) in the body of the file. Your main Dictionary does not have to change, but you would need a temporary Dictionary during startup with all different strings (values) and their references (keys).

C# Action vs. Event vs. Queue performance

I have a Camera class that produces very large images at a high FPS that require processing by a ImageProcessor class. I also have a WPF Control, my View, that displays this information. I need each of these components needs to run on it's own thread so it doesn't lock up the processing.
Method 1) Camera has an Action<Image> ImageCreated that ImageProcessor subscribes to. ImageProcessor has an Action<Image, Foo> ImageCreated that contains an altered Image and Foo results for the View to show.
Method 2) Camera has a threadsafe (using locks and monitors) ProducerConsumer to which it produces Images, and ImageProcessor waits and Consumes. Same story for the View.
Method 2 is nice because I can create and manage my own threads.
Method 1 is nice because I have have multiple ImageProcessors subscribed to the Camera class. But I'm not sure who's thread is doing the heavyweight work, or if Action is wasting time creating threads. Again these images come in many times per second.
I'm trying to get the images to my View as quickly as possible, without tying up processing or causing the View to lock up.
Thoughts?

Unless you do it yourself, using Method 1) does not introduce any multithreading. Invoking an action (unless you call BeginInvoke) does so synchronously, just like any normal method call.
I would advocate Method 2). There is no need to tie it to one single consumer. If you use this queue as a single point of contact between X cameras and Y processors, you've decoupled the cameras from the processors and could modify the value of X and Y independently.
EDIT
At the risk of being accused of blog spam here, I remembered that I wrote a component that's similar (if not an exact match) for what you're looking for awhile ago. See if this helps:
ProcessQueue
The gist of it is that you provide the queue with a delegate that can process a single item--in your case, Image--in the constructor, then call Start. As items are added to the queue using Enqueue, they're automatically dispatched to an appropriate thread and processed.
For example, if you wanted to have the image move Camera->Processor->Writer (and have a variable number of each), then I would do something like this:
ProcessQueue<Foo> processorQueue = new ProcessQueue<Foo>(f => WriteFoo(f));
ProcessQueue<Image> cameraQueue = new ProcessQueue<Image>(i => processorQueue.Enqueue(ProcessImage(i)));
You could vary the number of threads in cameraQueue (which controls the image processing) and processorQueue (which controls writing to disk) by using SetThreadCount.
Once you've done that, you would just call cameraQueue.Enqueue(image) whenever a camera captured an image.

Method one will not work - the Action<T> will executed on the thread that invoked it. Although you should probably use events instead of plain delegates in scenarios like this.
Method two is the way to go, but if possible you should use the new thread-safe collection of .NET 4.0 instead of doing the synchronization yourself - we all know how hard it is to get even the simplest multi-threaded code correct.

How can two threads access a common array of buffers with minimal blocking ? (c#)

I'm working on an image processing application where I have two threads on top of my main thread:
1 - CameraThread that captures images from the webcam and writes them into a buffer
2 - ImageProcessingThread that takes the latest image from that buffer for filtering.
The reason why this is multithreaded is because speed is critical and I need to have CameraThread to keep grabbing pictures and making the latest capture ready to pick up by ImageProcessingThread while it's still processing the previous image.
My problem is about finding a fast and thread-safe way to access that common buffer and I've figured that, ideally, it should be a triple buffer (image[3]) so that if ImageProcessingThread is slow, then CameraThread can keep on writing on the two other images and vice versa.
What sort of locking mechanism would be the most appropriate for this to be thread-safe ?
I looked at the lock statement but it seems like it would make a thread block-waiting for another one to be finished and that would be against the point of triple buffering.
Thanks in advance for any idea or advice.
J.

This could be a textbook example of the Producer-Consumer Pattern.
If you're going to be working in .NET 4, you can use the IProducerConsumerCollection<T> and associated concrete classes to provide your functionality.
If not, have a read of this article for more information on the pattern, and this question for guidance in writing your own thread-safe implementation of a blocking First-In First-Out structure.

Personally I think you might want to look at a different approach for this, rather than writing to a centralized "buffer" that you have to manage access to, could you switch to an approach that uses events. Once the camera thread has "received" an image it could raise an event, that passed the image data off to the process that actually handles the image processing.
An alternative would be to use a Queue, which the queue is a FIFO (First in First Out) data structure, now it is not thread-safe for access so you would have to lock it, but your locking time would be very minimal to put the item in the queue. There are also other Queue classes out there that are thread-safe that you could use.
Using your approach there are a number of issues that you would have to contend with. Blocking as you are accessing the array, limitations as to what happens after you run out of available array slots, blocking, etc..

Given the amount of precessing needed for a picture, I don't think that a simple locking scheme would be your bottleneck. Measure before you start wasting time on the wrong problem.
Be very careful with 'lock-free' solutions, they are always more complicated than they look.
And you need a Queue, not an array.
If you can use dotNET4 I would use the ConcurrentQuue.

You will have to run some performance metrics, but take a look at lock free queues.
See this question and its associated answers, for example.
In your particular application, though, you processor is only really interested in the most recent image. In effect this means you only really want to maintain a queue of two items (the new item and the previous item) so that there is no contention between reading and writing. You could, for example, have your producer remove old entries from the queue once a new one is written.
Edit: having said all this, I think there is a lot of merit in what is said in Mitchel Sellers's answer.

I would look at using a ReaderWriterLockSlim which allows fast read and upgradable locks for writes.

This isn't a direct answer to your question, but it may be better to rethink your concurrency model. Locks are a terrible way to syncronize anything -- too low level, error prone, etc. Try to rethink your problem in terms of message passing concurrency:
The idea here is that each thread is its own tightly contained message loop, and each thread has a "mailbox" for sending and receiving messages -- we're going to use the term MailboxThread to distinguish these types of objects from plain jane threads.
So instead of having two threads accessing the same buffer, you instead have two MailboxThreads sending and receiving messages between one another (pseudocode):
let filter =
while true
let image = getNextMsg() // blocks until the next message is recieved
process image
let camera(filterMailbox) =
while true
let image = takePicture()
filterMailbox.SendMsg(image) // sends a message asyncronous
let filterMailbox = Mailbox.Start(filter)
let cameraMailbox = Mailbox.Start(camera(filterMailbox))
Now you're processing threads don't know or care about any buffers at all. They just wait for messages and process them whenever they're available. If you send to many message for the filterMailbox to handle, those messages get enqueued to be processed later.
The hard part here is actually implementing your MailboxThread object. Although it requires some creativity to get right, its wholly possible to implement these types of objects so that they only hold a thread open while processing a message, and release the executing thread back to the thread-pool when there are no messages left to handle (this implementation allows you to terminate your application without dangling threads).
The advantage here is how threads send and receive messages without worrying about locking or syncronization. Behind the scenes, you need to lock your message queue between enqueing or dequeuing a message, but that implementation detail is completely transparent to your client-side code.

Just an Idea.
Since we're talking about only two threads, we can make some assumptions.
Lets use your tripple buffer idea. Assuming there is only 1 writer and 1 reader thread, we can toss a "flag" back-and-forth in the form of an integer. Both threads will continuously spin but update their buffers.
WARNING: This will only work for 1 reader thread
Pseudo Code
Shared Variables:
int Status = 0; //0 = ready to write; 1 = ready to read
Buffer1 = New bytes[]
Buffer2 = New bytes[]
Buffer3 = New bytes[]
BufferTmp = null
thread1
{
while(true)
{
WriteData(Buffer1);
if (Status == 0)
{
BufferTmp = Buffer1;
Buffer1 = Buffer2;
Buffer2 = BufferTmp;
Status = 1;
}
}
}
thread2
{
while(true)
{
ReadData(Buffer3);
if (Status == 1)
{
BufferTmp = Buffer1;
Buffer2 = Buffer3;
Buffer3 = BufferTmp;
Status = 0;
}
}
}
just remember, you're writedata method wouldn't create new byte objects, but update the current one. Creating new objects is expensive.
Also, you may want a thread.sleep(1) in an ELSE statement to accompany the IF statements, otherwise one a single core CPU, a spinning thread will increase the latency before the other thread gets scheduled. eg. The write thread may run spin 2-3 times before the read thread gets scheduled, because the schedulers sees the write thread doing "work"

Implementing multithreading in C# (code review)

Greetings.
I'm trying to implement some multithreaded code in an application. The purpose of this code is to validate items that the database gives it. Validation can take quite a while (a few hundred ms to a few seconds), so this process needs to be forked off into its own thread for each item.
The database may give it 20 or 30 items a second in the beginning, but that begins to decline rapidly, eventually reaching about 65K items over 24 hours, at which point the application exits.
I'd like it if anyone more knowledgeable could take a peek at my code and see if there's any obvious problems. No one I work with knows multithreading, so I'm really just on my own, on this one.
Here's the code. It's kinda long but should be pretty clear. Let me know if you have any feedback or advice. Thanks!
public class ItemValidationService
{
/// <summary>
/// The object to lock on in this class, for multithreading purposes.
/// </summary>
private static object locker = new object();
/// <summary>Items that have been validated.</summary>
private HashSet<int> validatedItems;
/// <summary>Items that are currently being validated.</summary>
private HashSet<int> validatingItems;
/// <summary>Remove an item from the index if its links are bad.</summary>
/// <param name="id">The ID of the item.</param>
public void ValidateItem(int id)
{
lock (locker)
{
if
(
!this.validatedItems.Contains(id) &&
!this.validatingItems.Contains(id)
){
ThreadPool.QueueUserWorkItem(sender =>
{
this.Validate(id);
});
}
}
} // method
private void Validate(int itemId)
{
lock (locker)
{
this.validatingItems.Add(itemId);
}
// *********************************************
// Time-consuming routine to validate an item...
// *********************************************
lock (locker)
{
this.validatingItems.Remove(itemId);
this.validatedItems.Add(itemId);
}
} // method
} // class

The thread pool is a convenient choice if you have light weight sporadic processing that isn't time sensitive. However, I recall reading on MSDN that it's not appropriate for large scale processing of this nature.
I used it for something quite similar to this and regret it. I took a worker-thread approach in subsequent apps and am much happier with the level of control I have.
My favorite pattern in the worker-thread model is to create a master thread which holds a queue of tasks items. Then fork a bunch of workers that pop items off that queue to process. I use a blocking queue so that when there are no items the process, the workers just block until something is pushed onto the queue. In this model, the master thread produces work items from some source (db, etc.) and the worker threads consume them.

I second the idea of using a blocking queue and worker threads. Here is a blocking queue implementation that I've used in the past with good results:
https://www.codeproject.com/Articles/8018/Bounded-Blocking-Queue-One-Lock
What's involved in your validation logic? If its mainly CPU bound then I would create no more than 1 worker thread per processor/core on the box. This will tell you the number of processors:
Environment.ProcessorCount
If your validation involves I/O such as File Access or database access then you could use a few more threads than the number of processors.

Be careful, QueueUserWorkItem might fail

There is a possible logic error in the code posted with the question, depending on where the item id in ValidateItem(int id) comes from. Why? Because although you correctly lock your validatingItems and validatedItems queues before queing a work item, you do not add the item to the validatingItems queue until the new thread spins up. That means there could be a time gap where another thread calls ValidateItem(id) with the same id (unless this is running on a single main thread).
I would add item to the validatingItems queue just before queuing the item, inside the lock.
Edit: also QueueUserWorkItem() returns a bool so you should use the return value to make sure the item was queued and THEN add it to the validatingItems queue.

ThreadPool may not be optimal for jamming so much at once into it. You may want to research the upper limits of its capabilities and/or roll your own.
Also, there is a race condition that exists in your code, if you expect no duplicate validations. The call to
this.validatingItems.Add(itemId);
needs to happen in the main thread (ValidateItem), not in the thread pool thread (Validate method). This call should occur a line before the queueing of the work item to the pool.
A worse bug is found by not checking the return of QueueUserWorkItem. Queueing can fail, and why it doesn't throw an exception is a mystery to us all. If it returns false, you need to remove the item that was added to the validatingItems list, and handle the error (throw exeception probably).

I would be concerned about performance here. You indicated that the database may give it 20-30 items per second and an item could take up to a few seconds to be validated. That could be quite a large number of threads -- using your metrics, worst case 60-90 threads! I think you need to reconsider the design here. Michael mentioned a nice pattern. The use of the queue really helps keep things under control and organized. A semaphore could also be employed to control number of threads created -- i.e. you could have a maximum number of threads allowed, but under smaller loads, you wouldn't necessarily have to create the maximum number if fewer ended up getting the job done -- i.e. your own pool size could be dynamic with a cap.
When using the thread-pool, I also find it more difficult to monitor the execution of threads from the pool in their performing the work. So, unless it's fire and forget, I am in favor of more controlled execution. I know you mentioned that your app exits after the 65K items are all completed. How are you monitoring you threads to determine if they have completed their work -- i.e. all queued workers are done. Are you monitoring the status of all items in the HashSets? I think by queuing your items up and having your own worker threads consume off that queue, you can gain more control. Albeit, this can come at the cost of more overhead in terms of signaling between threads to indicate when all items have been queued allowing them to exit.

You could also try using the CCR - Concurrency and Coordination Runtime. It's buried inside Microsoft Robotics Studio, but provides an excellent API for doing this sort of thing.
You'd just need to create a "Port" (essentially a queue), hook up a receiver (method that gets called when something is posted to it), and then post work items to it. The CCR handles the queue and the worker thread to run it on.
Here's a video on Channel9 about the CCR.
It's very high-performance and is even being used for non-Robotics stuff (Myspace.com uses it behind the scenese for their content-delivery network).

I would recommend looking into MSDN: Task Parallel Library - DataFlow. You can find examples of implementing Producer-Consumer in your case would be the database producing items to validate and the validation routine becomes the consumer.
Also recommend using ConcurrentDictionary<TKey, TValue> as a "Concurrent" hash set where you just populate the keys with no values :). You can potentially make your code lock-free.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.