Reactive Extensions: Items not processed, when busy processing previous items

Reactive Extensions: Items not processed, when busy processing previous items - c#

I'm using Rx to process events in groups of maximal X items or after no new events were sent for Y ms. For this purpose I use the code suggested in this answer here.
This worked fine until i recognized, that when I wait until all events are processed before continuing, it happens that some events stay in the queue unprocessed.
I found out, that this happens when a new event is sent while the queue is still busy processing the previous events. If an event is sent after the processing finished, all events in the queue are processed, even the events which were stuck previously.
Test code and output
To reproduce the behavior I wrote some sample code which adds numbers to the subject and then waits until the number is processed or the timeout is hit (10sec).
If the number is processed another number is added.
This second number is sent while the processing of the previous number is still busy, this leads to the described behavior.
This behavior can also be reproduced without any delay, it just won't happen regularly, because the timing has to be just right for it to happen.
private DispatcherScheduler _schedulerProviderDispatcher = new DispatcherScheduler(Application.Current.Dispatcher);
private Subject<int> _numberEvents = new Subject<int>();
private readonly ConcurrentDictionary<int, object> _numbersInProgress = new ConcurrentDictionary<int, object>();
private IDisposable _disposable;
public async Task SendNumbersAsync()
{
_disposable = _numberEvents.Synchronize()
.BufferUntilInactive(TimeSpan.FromMilliseconds(2000), _schedulerProviderDispatcher, 100)
.Subscribe(numbers =>
{
if (numbers.Count == 0)
return;
var values = string.Empty;
// Handle numbers in queue and remove them from the dictionary
for (var i = 0; i < numbers.Count; ++i)
{
var number = numbers[i];
if (_numbersInProgress.TryRemove(number, out _) == false)
Trace.WriteLine($"Failed to remove number '{number}'");
values += $"{number}, ";
Trace.WriteLine($"Handled Number: {number}, Count: {i + 1}/{numbers.Count}");
}
Trace.WriteLine($"Handled '{numbers.Count}' numbers. Values: '{values}'");
// delay the execution by 1000ms to simulate a slow processing of the numbers
// and create a timeframe where new numbers will be sent and received, but the subject is still busy with the previous processing
var task = Task.Delay(1000);
Task.WaitAll(task);
Trace.WriteLine($"Finished handling '{numbers.Count}' numbers. Values: '{values}'");
});
// push numbers to the subject
var number = 0;
while (_numbersInProgress.Count == 0)
{
++number;
Trace.WriteLine($"Sending number: {number}");
_numberEvents.OnNext(number);
Trace.WriteLine($"Finished sending number: {number}");
// add the number to the progress dictionary
if (_numbersInProgress.TryAdd(number, 0) == false)
Trace.WriteLine($"Failed to add number '{number}'");
var waitCount = 0;
var timedOut = false;
// wait for the numbers to be processed
// if we waited 100 times (10sec) stop
while (_numbersInProgress.Count != 0)
{
Trace.WriteLine($"Waiting ({waitCount})");
await Task.Delay(100);
++waitCount;
if (waitCount > 100)
{
timedOut = true;
break;
}
}
if (timedOut)
Trace.WriteLine($"Timeout waiting: {number}");
else
Trace.WriteLine($"Finished waiting: {number}");
}
// if we still have unprocessed numbers
// send another number to the subject to trigger processing of both numbers
if (_numbersInProgress.Count != 0)
{
// Failed
Trace.WriteLine($"Failed waiting: {number}");
Trace.WriteLine($"Sending number: {9999}");
_numberEvents.OnNext(9999);
Trace.WriteLine($"Finished sending number: {9999}");
}
}
And the extension method, used to group the numbers:
public static IObservable<IList<T>> BufferUntilInactive<T>(this IObservable<T> stream, TimeSpan delay, IScheduler scheduler = null, Int32? maxCount = null)
{
var s = scheduler ?? Scheduler.Default;
var publish = stream.Publish(p =>
{
var closes = p.Throttle(delay, s);
if (maxCount != null)
{
var overflows = p.Where((x, index) => index + 1 >= maxCount);
closes = closes.Amb(overflows);
}
return p.Window(() => closes).SelectMany(window => window.ToList());
});
return publish;
}
When executing this method, the trace outputs the following lines:
sending the second number
number 2 is sent before the processing of the previous sent number 1 is finished
timeout of the second number
the loop runs in the timeout condition after 100 wait cycles and then sends a new number, which triggeres the processing of both numbers in the queue
As you can see I've already tried to solve this problem by encapsulating the throttle and the grouping of the items (BufferUntilInactive) in a publish, but without any success.
Am I missing anything in the BufferUntilInactive method or somewhere else?

Related

How to reset a postponed / declined message in TPL Dataflow

I am using TDF for my application which works great so far, unfortunately i stumbled upon a specific problem where it seems it can not be handled directly with existing Dataflow mechanisms:
I have N producers (in this case BufferBlocks) which are all linked to only 1 (all to the same) ActionBlock. This block always processes 1 item at a time, and also only has capacity for 1 item.
To the link from the producers to the ActionBlock I also want to add a filter, but the special case here is that the filter condition can change independently of the processed item, and the item must not be discarded!
So basically i want to process all items, but the order / time can change when an item will be processed.
Unfortunately I learned, that if an item is "declined" once -> the filter condition evaluates false, and if this item is not passed to another block (e.g. NullTarget), the target block does not retry the same item (and does not re-evaluate the filter).
public class ConsumeTest
{
private readonly BufferBlock<int> m_bufferBlock1;
private readonly BufferBlock<int> m_bufferBlock2;
private readonly ActionBlock<int> m_actionBlock;
public ConsumeTest()
{
m_bufferBlock1 = new BufferBlock<int>();
m_bufferBlock2 = new BufferBlock<int>();
var options = new ExecutionDataflowBlockOptions() { BoundedCapacity = 1, MaxDegreeOfParallelism = 1 };
m_actionBlock = new ActionBlock<int>((item) => BlockAction(item), options);
var start = DateTime.Now;
var elapsed = TimeSpan.FromMinutes(1);
m_bufferBlock1.LinkTo(m_actionBlock, x => IsTimeElapsed(start, elapsed));
m_bufferBlock2.LinkTo(m_actionBlock);
FillBuffers();
}
private void BlockAction(int item)
{
Console.WriteLine(item);
Thread.Sleep(2000);
}
private void FillBuffers()
{
for (int i = 0; i < 1000; i++)
{
if (i % 2 == 0)
{
m_bufferBlock1.Post(i);
}
else
{
m_bufferBlock2.Post(i);
}
}
}
private bool IsTimeElapsed(DateTime start, TimeSpan elapsed)
{
Console.WriteLine("checking time elapsed");
return DateTime.Now > (start + elapsed);
}
public async Task Start()
{
await m_actionBlock.Completion;
}
}
The code sets up a testing pipeline, and fills the two buffers with odd and even numbers. Both BufferBlocks are connected to one single ActionBlock that only prints the "processed" number and waits 2 secs.
The filter condition between m_bufferBlock1 and the m_actionBlock checks (for testing purposes) if 1 minute is elapsed since we started the whole thing.
If we run this, it generates the following output:
1
checking time elapsed
3
5
7
9
11
13
15
17
19
As we can see, the ActionBlock takes the first element from the BufferBlock without filter, then tries to take an element from the BufferBlock with a filter. The filter evaluates false and it continues to take all elements from the block without the filter.
My expectation was that after an element from the BufferBlock without filter has been processed, it tries to take the element from the other BufferBlock with the filter again, evaluating it again.
This would be my expected (or desired) result:
1
checking time elapsed
3
checking time elapsed
5
checking time elapsed
7
checking time elapsed
9
checking time elapsed
11
checking time elapsed
13
checking time elapsed
15
// after timer has elapsed take elements also from other buffer
2
17
4
19
My question now is, is there a way to "reset" the already "declined" message so that it is evaluated again, or is there another way by modeling it differently? To outline, it is NOT important that they really are pulled from both Buffers strictly alternating! (because i know that this is scheduling dependent and it is totally fine if from time to time 2 items from the same block are dequeued)
But it is important that the "declined" message must not be discarded or re-queued as the order within one buffer is important.
Thank you in advance

One idea is to refresh the link between the two blocks, periodically or on demand. Implementing a periodically refreshable LinkTo is not very difficult. Here is an implementation:
public static IDisposable LinkTo<TOutput>(this ISourceBlock<TOutput> source,
ITargetBlock<TOutput> target, Predicate<TOutput> predicate,
TimeSpan refreshInterval, DataflowLinkOptions linkOptions = null)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (target == null) throw new ArgumentNullException(nameof(target));
if (predicate == null) throw new ArgumentNullException(nameof(predicate));
if (refreshInterval < TimeSpan.Zero)
throw new ArgumentOutOfRangeException(nameof(refreshInterval));
linkOptions = linkOptions ?? new DataflowLinkOptions();
var locker = new object();
var cts = new CancellationTokenSource();
var token = cts.Token;
var currentLink = source.LinkTo(target, linkOptions, predicate);
var loopTask = Task.Run(async () =>
{
try
{
while (true)
{
await Task.Delay(refreshInterval, token).ConfigureAwait(false);
currentLink.Dispose();
currentLink = source.LinkTo(target, linkOptions, predicate);
}
}
finally
{
lock (locker) { cts.Dispose(); cts = null; }
}
}, token);
_ = Task.Factory.ContinueWhenAny(new[] { source.Completion, target.Completion },
_ => { lock (locker) cts?.Cancel(); }, token, TaskContinuationOptions.None,
TaskScheduler.Default);
return new Unlinker(() =>
{
lock (locker) cts?.Cancel();
// Wait synchronously the task to complete, ignoring cancellation exceptions.
try { loopTask.GetAwaiter().GetResult(); } catch (OperationCanceledException) { }
currentLink.Dispose();
});
}
private struct Unlinker : IDisposable
{
private readonly Action _action;
public Unlinker(Action disposeAction) => _action = disposeAction;
void IDisposable.Dispose() => _action?.Invoke();
}
Usage example:
m_bufferBlock1.LinkTo(m_actionBlock, x => IsTimeElapsed(start, elapsed),
refreshInterval: TimeSpan.FromSeconds(10));
The link between the m_bufferBlock1 and the m_actionBlock will be refreshed every 10 seconds, until one of the two blocks completes.

Why do I get less running threads than the work I send to thread pool

I'm running a simple console migration.
I bundled the workload by batches of 750 items and send it to my ThreadPool via:
ThreadPool.QueueUserWorkItem(workerArray[i]._DoWork, i);
I added some logs in the execution of the _DoWork method, and it outputs:
Thread 1 started working...
Thread 2 started working...
Thread 1 is done working
for more than 400 times
But at each start and end I also log the number of threads running via outputing
(which I found in SO):
((IEnumerable)System.Diagnostics.Process.GetCurrentProcess().Threads)
.OfType<System.Diagnostics.ProcessThread>()
.Where(t => t.ThreadState == System.Diagnostics.ThreadState.Running)
.Count();
But why does it only output 2 to 4 threads when I have over 100 started threads and none done yet?
=============
Some more code
Here's what's queueing the work:
for (var i = 0; i < taskCount; i++)
{
int itemCount = queueLength;
if (i * queueLength + itemCount > journalIDs.Count)
itemCount = (journalIDs.Count) - (i * queueLength);
var queue = journalIDs.GetRange(i * queueLength, itemCount);
doneEvents[i] = new ManualResetEvent(false);
dbWorkers[i] = new DBWorker();
var loadedQueue = db.GetJournalByIDs(queue);
workerArray[i] = new JournalWorker(dbWorkers[i], loadedQueue, doneEvents[i]);
ThreadPool.QueueUserWorkItem(workerArray[i]._DoWork, i);
}
Here's what the DoWork does:
public void _DoWork(Object pThreadContext) {
int theThreadIndex = (int)pThreadContext;
Console.WriteLine("Thread {0} started...", theThreadIndex);
var threads = ((IEnumerable)System.Diagnostics.Process.GetCurrentProcess().Threads)
.OfType<System.Diagnostics.ProcessThread>()
.Where(t => t.ThreadState == System.Diagnostics.ThreadState.Running)
.Count();
Console.WriteLine("There's {0} thread(s) currently running", threads);
foreach (var item in Queue)
{
var update = ShouldUpdate(item);
if (update != null)
{
//Do some db operation
}
else
{
//Do some db operation
}
}
Console.WriteLine("Thread {0} started saving...", theThreadIndex);
Save(Store);
Console.WriteLine("Thread {0} done working with " + Store.Count + " objects...", theThreadIndex);
threads = ((IEnumerable)System.Diagnostics.Process.GetCurrentProcess().Threads)
.OfType<System.Diagnostics.ProcessThread>()
.Where(t => t.ThreadState == System.Diagnostics.ThreadState.Running)
.Count();
Console.WriteLine("There's {0} thread(s) currently running", threads);
db.Clear();
db = null;
_doneEvent.Set();
}

There's a couple of things to consider here
ThreadPool maximum concurrent threads defaults based on several different factors in .NET 4.0. You can check the actual value by calling GetMaxThreads
When waiting for a ManualResetEvent, the ThreadState will be WaitSleepJoin, not Running
Console is thread safe but won't give you predictable, accurately timed, correctly ordered output when used in a multithreaded scenario

ConcurrentQueue that allows me to wait on one producer

I've a problem of Producer/Consumer. Currently I've a simple Queue surrounded by a lock.
I'm trying to replace it with something more efficient.
My first choice was to use a ConcurrentQueue, but I don't see how to make my consumer wait on the next produced message(without doing Thread.Sleep).
Also, I would like to be able to clear the whole queue if its size reach a specific number.
Can you suggest some existing class or implementation that would match my requirements?

Here is an example on how you can use the BlockingCollection class to do what you want:
BlockingCollection<int> blocking_collection = new BlockingCollection<int>();
//Create producer on a thread-pool thread
Task.Run(() =>
{
int number = 0;
while (true)
{
blocking_collection.Add(number++);
Thread.Sleep(100); //simulating that the producer produces ~10 items every second
}
});
int max_size = 10; //Maximum items to have
int items_to_skip = 0;
//Consumer
foreach (var item in blocking_collection.GetConsumingEnumerable())
{
if (items_to_skip > 0)
{
items_to_skip--; //quickly skip items (to meet the clearing requirement)
continue;
}
//process item
Console.WriteLine(item);
Thread.Sleep(200); //simulating that the consumer can only process ~5 items per second
var collection_size = blocking_collection.Count;
if (collection_size > max_size) //If we reach maximum size, we flag that we want to skip items
{
items_to_skip = collection_size;
}
}

Executing N number of threads in parallel and in a sequential manner

I have an application where i have 1000+ small parts of 1 large file.
I have to upload maximum of 16 parts at a time.
I used Thread parallel library of .Net.
I used Parallel.For to divide in multiple parts and assigned 1 method which should be executed for each part and set DegreeOfParallelism to 16.
I need to execute 1 method with checksum values which are generated by different part uploads, so i have to set certain mechanism where i have to wait for all parts upload say 1000 to complete.
In TPL library i am facing 1 issue is it is randomly executing any of the 16 threads from 1000.
I want some mechanism using which i can run first 16 threads initially, if the 1st or 2nd or any of the 16 thread completes its task next 17th part should be started.
How can i achieve this ?

One possible candidate for this can be TPL Dataflow. This is a demonstration which takes in a stream of integers and prints them out to the console. You set the MaxDegreeOfParallelism to whichever many threads you wish to spin in parallel:
void Main()
{
var actionBlock = new ActionBlock<int>(
i => Console.WriteLine(i),
new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 16});
foreach (var i in Enumerable.Range(0, 200))
{
actionBlock.Post(i);
}
}
This can also scale well if you want to have multiple producer/consumers.

Here is the manual way of doing this.
You need a queue. The queue is sequence of pending tasks. You have to dequeue and put them inside list of working task. When ever the task is done remove it from list of working task and take another from queue. Main thread controls this process. Here is the sample of how to do this.
For the test i used List of integer but it should work for other types because its using generics.
private static void Main()
{
Random r = new Random();
var items = Enumerable.Range(0, 100).Select(x => r.Next(100, 200)).ToList();
ParallelQueue(items, DoWork);
}
private static void ParallelQueue<T>(List<T> items, Action<T> action)
{
Queue pending = new Queue(items);
List<Task> working = new List<Task>();
while (pending.Count + working.Count != 0)
{
if (pending.Count != 0 && working.Count < 16) // Maximum tasks
{
var item = pending.Dequeue(); // get item from queue
working.Add(Task.Run(() => action((T)item))); // run task
}
else
{
Task.WaitAny(working.ToArray());
working.RemoveAll(x => x.IsCompleted); // remove finished tasks
}
}
}
private static void DoWork(int i) // do your work here.
{
// this is just an example
Task.Delay(i).Wait();
Console.WriteLine(i);
}
Please let me know if you encounter problem of how to implement DoWork for your self. because if you change method signature you may need to do some changes.
Update
You can also do this with async await without blocking the main thread.
private static void Main()
{
Random r = new Random();
var items = Enumerable.Range(0, 100).Select(x => r.Next(100, 200)).ToList();
Task t = ParallelQueue(items, DoWork);
// able to do other things.
t.Wait();
}
private static async Task ParallelQueue<T>(List<T> items, Func<T, Task> func)
{
Queue pending = new Queue(items);
List<Task> working = new List<Task>();
while (pending.Count + working.Count != 0)
{
if (working.Count < 16 && pending.Count != 0)
{
var item = pending.Dequeue();
working.Add(Task.Run(async () => await func((T)item)));
}
else
{
await Task.WhenAny(working);
working.RemoveAll(x => x.IsCompleted);
}
}
}
private static async Task DoWork(int i)
{
await Task.Delay(i);
}

var workitems = ... /*e.g. Enumerable.Range(0, 1000000)*/;
SingleItemPartitioner.Create(workitems)
.AsParallel()
.AsOrdered()
.WithDegreeOfParallelism(16)
.WithMergeOptions(ParallelMergeOptions.NotBuffered)
.ForAll(i => { Thread.Slee(1000); Console.WriteLine(i); });
This should be all you need. I forgot how the methods are named exactly... Look at the documentation.
Test this by printing to the console after sleeping for 1sec (which this sample code does).

Another option would be to use a BlockingCollection<T> as a queue between your file reader thread and your 16 uploader threads. Each uploader thread would just loop around consuming the blocking collection until it is complete.
And, if you want to limit memory consumption in the queue you can set an upper limit on the blocking collection such that the file-reader thread will pause when the buffer has reached capacity. This is particularly useful in a server environment where you may need to limit memory used per user/API call.
// Create a buffer of 4 chunks between the file reader and the senders
BlockingCollection<Chunk> queue = new BlockingCollection<Chunk>(4);
// Create a cancellation token source so you can stop this gracefully
CancellationTokenSource cts = ...
File reader thread
...
queue.Add(chunk, cts.Token);
...
queue.CompleteAdding();
Sending threads
for(int i = 0; i < 16; i++)
{
Task.Run(() => {
foreach (var chunk in queue.GetConsumingEnumerable(cts.Token))
{
.. do the upload
}
});
}

How to asynchronously run 3 processes, when one returns desired value stop the other two and continue with program?

I have a method that uses the aforge.net framework to templatematch (check an image against another image for similarity) a number of seperate images against an area of the screen. This task can take a very long time or it can be near instant depending on the number of images, size of the images, and the area being checked.
Only 1 image in the list of images will ever return a match so I would like to test all the images against the screen at the same time and at which point 1 of these images returns true the remaining processes are immediately canceled and my program moves on to its next step.
Now, in the example I give I am grabbing an integer value based upon which match returns true but the concept is always the same.. x number of images tested against a screenshot.. 1 will return true, the rest will not. Sometimes the first returns true and the process is nice and fast other times it's the 30th in the list and synchronously matching the template for 30 images takes a considerable amount of time in comparison to 1.
One caveat to note about my code that follows.. I won't always return an integer, I will normally return a boolean value as to which image was found but this code here was the easiest to detail and the same general principle applies (ie: if I can figure it out one way I'll be able to do it the other).
Currently my (synchronous) code reads as follows... How would I make this an asynchronous call that can do what I've described? If possible please detail your answer as I intend to learn so that I can readily do this type of thing in the future. I understand the concept of async but for some reason cannot wrap my head around exactly how to do it the way I want.
public void Battle()
{
var myGuysTurn = WhosTurn();
// other logic here.
}
private int WhosTurn()
{
var whosTurn = 0;
var whosTurnCheck = _templateMatch.Match(_tabula.BattleHeroTurn1());
if (whosTurnCheck)
{
whosTurn = 1;
return whosTurn;
}
whosTurnCheck = _templateMatch.Match(_tabula.BattleHeroTurn2());
if (whosTurnCheck)
{
whosTurn = 2;
return whosTurn;
}
whosTurnCheck = _templateMatch.Match(_tabula.BattleHeroTurn3());
if (whosTurnCheck)
{
whosTurn = 3;
return whosTurn;
}
return whosTurn;
}

I'd use Task.WaitAny() combined with a CancellationToken. Essentially, start each task in parallel and wait until any of complete. If the completed task was successful, cancel the others. If not, continue to wait for other tasks to complete.
I've replaced _templateMatch.Match(_tabula.BattleHeroTurnX()) with a static method BattleHeroTurnX for brevity:
private int WhosTurn()
{
// Create cancellation token. Will be used to inform other threads that they should immediately cancel processing
CancellationTokenSource cts = new CancellationTokenSource();
// Collection of tasks that run in parallel
List<Task<int>> tasks = new List<Task<int>>()
{
Task.Run<int>(() => {
return BattleHeroTurn1(cts.Token) ? 1 : 0;
}),
Task.Run<int>(() => {
return BattleHeroTurn2(cts.Token) ? 2 : 0;
}),
Task.Run<int>(() => {
return BattleHeroTurn3(cts.Token) ? 3 : 0;
})
};
// Wait for any task to complete and if it is successful, cancel the other tasks and return
while (tasks.Any())
{
// Get the index of the task that completed
int completedTaskIndex = Task.WaitAny(tasks.ToArray());
int turn = tasks[completedTaskIndex].Result;
if(turn > 0)
{
cts.Cancel();
return turn;
}
tasks.RemoveAt(completedTaskIndex);
}
// All tasks have completed but no BattleHeroTurnX returned true
return 0;
}
static bool BattleHeroTurn1(CancellationToken token)
{
// Processing images. After each one is processed, ensure that the token has not been canceled
for(int i = 0; i < 100; i++)
{
Thread.Sleep(50);
if (token.IsCancellationRequested)
{
return false;
}
}
return true;
}
static bool BattleHeroTurn2(CancellationToken token)
{
// Processing images. After each one is processed, ensure that the token has not been canceled
for (int i = 0; i < 10; i++)
{
Thread.Sleep(70);
if (token.IsCancellationRequested)
{
return false;
}
}
return true;
}
static bool BattleHeroTurn3(CancellationToken token)
{
// Processing images. After each one is processed, ensure that the token has not been canceled
for (int i = 0; i < 1000; i++)
{
Thread.Sleep(500);
if (token.IsCancellationRequested)
{
return false;
}
}
return true;
}
See this and this for further information.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Reactive Extensions: Items not processed, when busy processing previous items - c#

Related

How to reset a postponed / declined message in TPL Dataflow

Why do I get less running threads than the work I send to thread pool

ConcurrentQueue that allows me to wait on one producer

Executing N number of threads in parallel and in a sequential manner

How to asynchronously run 3 processes, when one returns desired value stop the other two and continue with program?

Categories

Resources