Way to know if TPL Dataflow Block busy?

Way to know if TPL Dataflow Block busy? - c#

TPL Dataflow block has .InputCount and .OutputCount properties. But it can perform execution over item right now, and there is no property like .Busy [Boolean]. So is there a way to know if block is now operating and one of item still there?
UPDATE:
Let me explain my issue. Here on pic is my current Dataflow network scheme.
BufferBlock holds URLs to load, number of TransformBlocks load pages through proxy servers and ActionBlock at the end performs work with loaded pages. TransformBlocks has predefined .BoundedCapacity, so BufferBlock waits for any of TransformBlocks becomes free and then post item into it.
Initially I post all URLs to Buffer Block. Also if one of TransformBlocks throw exception during loading HTML, it returns it's URL back to BufferBlock. So my goal is somehow wait until all of my URLs was guarantee loaded and parsed. For now I'm waiting like this:
Do While _BufferBlock.Count > 0 Or _
GetLoadBlocksTotalInputOutputCount(_TransformBlocks) > 0 Or _
_ActionBlock.InputCount > 0
Await Task.Delay(1000)
Loop
Then I call TransformBlock.Complete on all of them. But in this case, there still can be last URLs loading it TransformBlocks. If last URL was not successfully loaded, it becomes 'lost', because none of TransformBlocks wouldn't take it back. That's why I want to know if TransformBlocks are still operating. Sorry my bad English.

Even if you could find out whether a block is processing an item, it wouldn't really help you achieve your goal. That's because you would need to check the state of all the blocks at exactly the same moment, and there is no way to do that.
What I think you need is to somehow manually track how many items have been fully processed and compare that with the total number of items to process.
You should know the number of items to process from the start (it's you who sends them to the buffer block). To track the number of items that have been fully processed, you can add a counter to your parsing action block (don't forget to make the counter thread-safe, since your action block is parallel).
Then, if the counter reaches the total number of items to process, you know that all work is done.

Related

Pipelines buffer preserving until processing is complete

I am researching the possibility of using pipelines for processing binary messages coming from network.
The binary messages i will be processing come with an payload and it is desirable to keep the payload in its binary form.
The idea is to read out the whole message and create a slice of message and its payload, once the message is completely read it will be passed to a channel chain for processing, the processing will not be instant and might take some time or be executed later and the goal is not to have the pipe reader wait until the processing is complete, then once the message processing is complete i would need to release the processed buffer region to the pipe writer.
Now of course i could just create a new byte array and copy the data coming from pipe writer but that would beat the purpose of no-copy? So as i understand i would need some buffer synchronization between the pipeline and the channel?
I observed the available apis (AdvanceTo) of pipe reader where its possible to tell the pipe reader what was consumed and what was examined but cant get around how this could be synced outside of the pipe reading method.
So the question would be whether there are some techniques or examples on how this can be achieved.

The buffer obtained from TryRead/ReadAsync is only valid until you call AdvanceTo, with the expectation that as soon as you've done that: anything you reported as consumed is available to be recycled for use elsewhere (which could be parallel/concurrent readers). Strictly speaking: even the bits you haven't reported as consumed: you still shouldn't treat as valid once you've called AdvanceTo (although in reality, it is likely that they'll still be the same segments - just: that isn't the concern of the caller; to the caller, it is only valid between the read and the advance).
This means that you explicitly can't do:
while (...)
{
var result = await pipe.ReadAsync();
if (TryIdentifyFrameBoundary(out var frame)) {
BeginProcessingInBackground(frame); // <==== THIS IS A PROBLEM!
reader.AdvanceTo(frame.End, frame.End);
}
else if { // take nothing
reader.AdvanceTo(buffer.Start, buffer.End);
if (result.IsCompleted) break; // that's all folks
}
}
because the "in background" bit, when it fires, could now be reading someone else's data (due to it being reused already).
So: either you need to process the frame contents as part of the read loop, or you're going to have to make a copy of the data, most likely by using:
c#
var len = checked ((int)buffer.Length);
var oversized = ArrayPool<byte>.Shared.Rent(len);
buffer.CopyTo(oversized);
and pass oversized to your background processing, remembering to only look at the first len bytes of it. You could pass this as a ReadOnlyMemory<byte>, but you need to consider that you're also going to want to return it to the array-pool afterwards (probably in a finally block), and passing it as a memory makes it a little more awkward (but not impossible, thanks to MemoryMarshal.TryGetArray).
Note: in early versions of the pipelines API, there was an element of reference-counting, which did allow you to preserve buffers, but it had a few problems:
it complicated the API hugely
it led to leaked buffers
it was ambiguous and confusing what "preserved" meant; is the count until it gets reused? or released completely?
so that feature was dropped.

Complex flow with Observables

I have a sequence of Images (IObservable<ImageSource>) that goes through this "pipeline".
Each image is recognized using OCR
If the results have valid values, the are uploaded to a service that can register a set of results at a given time (not concurrently).
If the results have any invalid value, they are presented to the user in order to fix them. After they are fixed, the process continues.
During the process, the UI should stay responsive.
The problem is that I don't know how to handle the case when the user has to interact. I just cannot do this
subscription = images
.Do(source => source.Freeze())
.Select(image => OcrService.Recognize(image))
.Subscribe(ocrResults => Upload(ocrResults));
...because when ocrResults have to be fixed by the user, the flow should be kept on hold until the valid values are accepted (ie. the user could execute a Command clicking a Button)
How do I say: if the results are NOT valid, wait until the user fixes them?

This seems to be a mix of UX, WPF and Rx all wrapped up in one problem. Trying to solve it with only Rx is probably going to send you in to a tail spin. I am sure you could solve it with just Rx, and no more thought about it, but would you want to? Would it be testable, loosely coupled and easy to maintain?
In my understanding of the problem you have to following steps
User Uploads/Selects some images
The system performs OCR on each image
If the OCR tool deems the image source to be valid, the result of the processing is uploaded
If the OCR tool deems the image source to be invalid, the user "fixes" the result and the result is uploaded
But this may be better described as
User Uploads/Selects some images
The system performs OCR on each image
The result of the OCR is placed in a validation queue
While the result is invalid, a user is required to manually update it to a valid state.
The valid result is uploaded
So this to me seem that you need a task/queue based UI so that a User can see invalid OCR results that they need to work on. This also then tells me that if a person is involved, that it should probably be outside of the Rx query.
Step 1 - Perform ORC
subscription = images
.Subscribe(image=>
{
//image.Freeze() --probably should be done by the source sequence
var result = _ocrService.Recognize(image);
_validator.Enqueue(result);
});
Step 2 - Validate Result
//In the Enqueue method, if queue is empty, ProcessHead();
//Else add to queue.
//When Head item is updated, ProcessHead();
//ProcessHead method checks if the head item is valid, and if it is uploads it and remove from queue. Once removed from queue, if(!IsEmpty) {ProcessHead();}
//Display Head of the Queue (and probably queue depth) to user so they can interact with it.
Step 3 - Upload result
Upload(ocrResults)
So here Rx is just a tool in our arsenal, not the one hammer that needs to solve all problems. I have found that with most "Rx" problems that grow in size, that Rx just acts as the entry and exit points for various Queue structures. This allows us to make the queuing in our system explicit instead of implicit (i.e. hidden inside of Rx operators).

I'm assuming your UploadAsync method returns a Task to allow you to wait for it to finished? If so, there are overloads of SelectMany that handle tasks.
images.Select(originalImage => ImageOperations.Resize(originalImage))
.SelectMany(resizedImg => imageUploader.UploadAsync(resizedImg))
.Subscribe();

Assuming you've got an async method which implements the "user fix process":
/* show the image to the user, which fixes it, returns true if fixed, false if should be skipped */
async Task UserFixesTheOcrResults(ocrResults);
Then your observable becomes:
subscription = images
.Do(source => source.Freeze())
.Select(image => OcrService.Recognize(image))
.Select(ocrResults=> {
if (ocrResults.IsValid)
return Observable.Return(ocrResults);
else
return UserFixesTheOcrResults(ocrResults).ToObservable().Select(_ => ocrResults)
})
.Concat()
.Subscribe(ocrResults => Upload(ocrResults));

How to apply Dynamic wait in Ranorex?

I wants to apply dynamic wait in ranorex.
To open a webpage I used static wait like this :-
Host.Local.OpenBrowser("http://www.ranorex.com/Documentation/Ranorex/html/M_Ranorex_WebDocument_Navigate_2.htm",
"firefox.exe");
Delay.Seconds(15);
Please provide me a proper solution in details. Waiting for your humble reply.

The easiest way is use the wait for document loaded method. This allows you to set a timeout that is the maximum to wait, but will continue when the element completes it's load. Here is the documentation on it,
http://www.ranorex.com/Documentation/Ranorex/html/M_Ranorex_WebDocument_WaitForDocumentLoaded_1.htm

First of all, you should be more detailed about your issues. Atm you actually don't state any issue and don't even specify the reason for the timeout.
I don't actually see why you would need a timeout there. The next element to be interacted with in your tests will have it's own search timeouts. In my experience I haven't had a need or a reason to have a delay for the browser opening.
If you truelly need a dynamic delay there, here's what you actually should validate.
1) Either select an element that always exists on the webpage when you open the browser or
2) Select the next element to be interacted with and build the delay ontop of either of these 2
Let's say that we have a Input field that we need to add text to after the page has opened. The best idea would be do wait for that element to exists and then continue with the test case.
So, we wait for the element to exist (add the element to the repository):
repo.DomPart.InputElementInfo.WaitForExists(30000);
And then we can continue with the test functionality:
repo.DomPart.InputElement.InnerText = "Test";
What waitForExists does is it waits for 30 seconds (30000 ms) for the element to exists. It it possible to catch an exception from this and add error handleing if the element is not found.
The dynamic functionality has to be added by you. In ranorex at one point you will always run into a timeout. It might be a specified delay, it might be the timeout for a repo element, etc. The "dynamic" functionality is mostly yours to do.
If this is not the answer you were looking for, please speicify the reason for the delay and i'll try to answer your specific issue more accurately.

Determininistic random numbers in parallel code

I have a question regarding thread ordering for the TPL.
Indeed, it is very important for me that my Parallel.For loop to be executed in the order of the loop. What I mean is that given 4 threads, i would like the first thread to execute every 4k loop, 2nd thread every 4k+1 etc with (k between 0 and, NbSim/4).
1st thread -> 1st loop, 2nd thread -> 2nd loop , 3rd thread -> 3rd loop
4th thread -> 4th loop , 1th thread -> 5th loop etc ...
I have seen the OrderedPartition directive but I am not quite sure of the way I should apply it to a FOR loop and not to a Parallel.FOREACH loop.
Many Thanks for your help.
Follwing the previous remkarks, I am completing the description :
Actually, after some consideration, I believe that my problem is not about ordering.
Indeed, I am working on a Monte-Carlo engine, in which for each iteration I am generating a set of random numbers (always the same (seed =0)) and then apply some business logic to them. Thus everything should be deterministic and when running the algorithm twice I should get the exact same results. But unfortunately this is not the case, and I am strugeling to understand why. Any idea, on how to solve that kind of problems (without printing out every variable I have)?
Edit Number 2:
Thank you all for your suggestions
First, here is the way my code is ordered :
ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 4; //or 1
ParallelLoopResult res = Parallel.For<LocalDataStruct>(1,NbSim, options,
() => new LocalDataStruct(//params of the constructor of LocalData),
(iSim, loopState, localDataStruct) => {
//logic
return localDataStruct;
}, localDataStruct => {
lock(syncObject) {
//critical section for outputting the parameters
});
When setting the degreeofParallelism to 1,everything works fine, however when setting the degree of Parallelism to 4 I am getting results that are false and non deterministic (when running the code twice I get different results). It is probably due to mutable objects that is what I am checking now, but the source code is quite extensive, so it takes time. Do you think that there is a good strategy to check the code other than review it (priniting out all variables is impossible in this case (> 1000)? Also when setting the Nb of Simulation to 4 for 4 threads everything is working fine as well, mostly due to luck I believe ( that s why I metionned my first idea regarding ordering).

You can enforce ordering in PLINQ but it comes at a cost. It gives ordered results but does not enforce ordering of execution.
You really cannot do this with TPL without essentially serializing your algorithm. The TPL works on a Task model. It allows you to schedule tasks which are executed by the scheduler with no guarantee as to the order in which the Tasks are executed. Typically parallel implementations take the PLINQ approach and guarantee ordering of results not ordering of execution.
Why is ordered execution important?
So. For a Monte-Carlo engine you would need to make sure that each index in your array received the same random numbers. This does not mean that you need to order your threads, just make the random numbers are ordered across the work done by each thread. So if each loop of your ParallelForEach was passed not only the array of elements to do work on but also it's own instance of a random number generator (with a different fixed seed per thread) then you will still get deterministic results.
I'm assuming that you are familiar with the challenges related to parallelizing Monte-Carlo and generating good random number sequences. If not here's something to get you started;Pseudo-random Number Generation for
Parallel Monte Carlo—A Splitting Approach, Fast, High-Quality, Parallel Random-Number Generators: Comparing Implementations.
Some suggestions
I would start off by ensuring that you can get deterministic results in the sequential case by replacing the ParallelForEach with a ForEach and see if this runs correctly. You could also try comparing the output of a sequential and a parallel run, add some diagnostic output and pipe it to a text file. Then use a diff tool to compare the results.
If this is OK then it is something to do with your parallel implementation, which as is pointed out below is usually related to mutable state. Some things to consider:
Is your random number generator threadsafe? Random is a poor random number generator at best and as far as I know is not designed for parallel execution. It is certainly not suitable for M-C calculations, parallel or otherwise.
Does your code have other state shared between threads, if so what is it? This state will be mutated in a non-deterministic manner and effect your results.
Are you merging results from different threads in parallel. The non-associativity of parallel floating point operations will also cause you issues here, see How can floating point calculations be made deterministic?. Even if the thread results are deterministic if you are combining them in a non deterministic way you will still have issues.

Assuming all threads share the same random number generator, then although you are generating the same sequence every time, which thread gets which elements of this sequence is non-deterministic. Hence you could arrive at different results.
That's if the random number generator is thread-safe; if it isn't, then it's not even guaranteed to generate the same sequence when called from multiple threads.
Apart from that it is difficult to theorize what could be causing non-determinism to arise; basically any global mutable state is suspicious. Each Task should be working with its own data.

If, rather than using a random number generator, you set up an array [0...N-1] of pre-determined values, say [0, 1/N, 2/N, ...], and do a Parallel.ForEach on that, does it still give nondeterministic results? If so, the RNG isn't the issue.

Displaying images with a high frame rate

here's the problem: I have a custom hardware device and I have to grab images from it in C#/WPF and display them in a window, all with 120+ FPS.
The problem is that there is no event to indicate the images are ready, but I have to constantly poll the device and check whether there are any new images and then download them.
There are apparently a handful of ways to do it, but I haven't been able to find the right one yet.
Here's what I tried:
A simple timer (or DispatcherTimer) - works great for slower frame rates but I can't get it past let's say, 60 FPS.
An single threaded infinite loop - quite fast but I have to put the DoEvents/it's WPF equivalent in the loop in order for window to be redrawn; this has some other unwanted (strange) consequences such as key press events from some controls not being fired etc..
Doing polling/downloading in another thread and displaying in UI thread, something like this:
new Thread(() =>
{
while (StillCapturing)
{
if (Camera.CheckForAndDownloadImage(CameraInstance))
{
this.Dispatcher.Invoke((Action)this.DisplayImage);
}
}
}).Start();
Well, this works relatively well, but puts quite a load on a CPU and of course completely kills the machine if it doesn't have more than one CPU/core, which is unacceptable. Also, I there is a large number of thread contentions this way.
The question is obvious - are there any better alternatives, or is one of these the way to go in this case?
Update:
I somehow forgot to mention that (well, forgot to think about it while writing this question), but of course I don't need all frames to be displayed, however I still need to capture all of them so they can be saved to a hard drive.
Update2:
I found out that the DispatcherTimer method is slow not because it can't process everything fast enough, but because the DispatcherTimer waits for the next vertical sync before firing the tick event; which is actually good in my case, because in the tick event I can save all pending images to a memory buffer (used for saving images to disk) and display just the last one.
As for the old computers being completely "killed" by capturing, it appears that WPF falls back to software rendering which is very slow. There's probably nothing I can do about.
Thanks for all the answers.

I think you're trying for too simplistic of an approach. Here's what I would do.
a) put a Thread.Sleep(5) in your polling loop, that should allow you to get close to 120fps while still keeping CPU times low.
b) Only update the display with every 5th frame or so. That will cut down on the amount of processing as I'm not sure that WPF is made to handle much more than 60fps.
c) Use ThreadPool to spawn a subtask for each frame that will then go and save it to the disk (in a seperate file per frame), that way you won't be as limited by disk performance. Extra frames will just pile up in memory.
Personally I would implement them in that order. Chances are a or b will fix your problems.

You could do the following (all psuedocode):
1. Have a worker thread running dealing with the capture process:
List<Image> _captures = new List<Image>();
new Thread(() =>
{
while (StillCapturing)
{
if (Camera.CheckForAndDownloadImage(CameraInstance))
{
lock(_locker){_captures.Add(DisplayImage);
}
}
}).Start();
Have the dispatcher timer thread take latest captured image (obviously it will have missed some captures since last tick) and display. Therefore, UI thread is throttled and doing as little as possible, it isn't doing all the "capturing", this is done by worker threads. sorry I can't get this bit to format (but you get the idea):
void OnTimerTick(can't remember params)
{
Image imageToDisplay;
lock(_locker){imageToDisplay = _captures[k.Count - 1];
DisplayFunction(imageToDisplay);
}
it might be that the list is a queue and another thread is used to bleed the queue and write to disk or whatever.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.