Blocking collection when collect results inside ActionBlock - c#

I think in the test method the "results" collection variable has to be of type BlockingCollection<int> instead of List<int>. Prove it to me if I am wrong. I have taken this example from https://blog.stephencleary.com/2012/11/async-producerconsumer-queue-using.html
private static async Task Produce(BufferBlock<int> queue, IEnumerable<int> values)
{
foreach (var value in values)
{
await queue.SendAsync(value);
}
}
public async Task ProduceAll(BufferBlock<int> queue)
{
var producer1 = Produce(queue, Enumerable.Range(0, 10));
var producer2 = Produce(queue, Enumerable.Range(10, 10));
var producer3 = Produce(queue, Enumerable.Range(20, 10));
await Task.WhenAll(producer1, producer2, producer3);
queue.Complete();
}
[TestMethod]
public async Task ConsumerReceivesCorrectValues()
{
var results = new List<int>();
// Define the mesh.
var queue = new BufferBlock<int>(new DataflowBlockOptions { BoundedCapacity = 5, });
//var consumerOptions = new ExecutionDataflowBlockOptions { BoundedCapacity = 1, };
var consumer = new ActionBlock<int>(x => results.Add(x), consumerOptions);
queue.LinkTo(consumer, new DataflowLinkOptions { PropagateCompletion = true, });
// Start the producers.
var producers = ProduceAll(queue);
// Wait for everything to complete.
await Task.WhenAll(producers, consumer.Completion);
// Ensure the consumer got what the producer sent.
Assert.IsTrue(results.OrderBy(x => x).SequenceEqual(Enumerable.Range(0, 30)));
}

Since ActionBlock<T> restricts its delegate to one-execution-at-a-time by default (MaxDegreeOfParallelism of 1), it is not necessary to use BlockingCollection<T> instead of List<T>.
The test in your code passes just fine for me, as expected.
If ActionBlock<T> were passed an option with a higher MaxDegreeOfParallelism, then you would need to protect the List<T> or replace it with a BlockingCollection<T>.

Related

TPL Dataflow - block not processing as expected

I have a set of simple blocks which are mostly processed in a serial manner but I have two blocks which I want to process in parallel (processblock1 & processblock2). I just started playing around with TPL datablocks so new to it.
However in the code below, I can see paraellelblock1 is being called as but never parallelblock2 as expected. I was hoping they would both be kicked off in parallel.
class Program
{
static void Main(string[] args)
{
var readBlock = new TransformBlock<int, int>(x => DoSomething(x, "readBlock"),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 1 }); //1
var processBlock1 =
new TransformBlock<int, int>(x => DoSomething(x, "processBlock1")); //2
var processBlock2 =
new TransformBlock<int, int>(x => DoSomething(x, "processBlock2")); //3
var saveBlock =
new ActionBlock<int>(
x => Save(x)); //4
readBlock.LinkTo(processBlock1,
new DataflowLinkOptions { PropagateCompletion = true }); //5
readBlock.LinkTo(processBlock2,
new DataflowLinkOptions { PropagateCompletion = true }); //6
processBlock1.LinkTo(
saveBlock); //7
processBlock2.LinkTo(
saveBlock); //8
readBlock.Post(1); //10
Task.WhenAll(
processBlock1.Completion,
processBlock2.Completion)
.ContinueWith(_ => saveBlock.Complete()); //11
readBlock.Complete(); //12
saveBlock.Completion.Wait(); //13
Console.WriteLine("Processing complete!");
Console.ReadLine();
}
private static int DoSomething(int i, string method)
{
Console.WriteLine($"Do Something, callng method : { method}");
return i;
}
private static async Task<int> DoSomethingAsync(int i, string method)
{
DoSomething(i, method);
return i;
}
private static void Save(int i)
{
Console.WriteLine("Save!");
}
}
By default tpl block will only send a message to the first linked block.
Use a BroadcastBlock to send a message to many components.
void Main()
{
var random = new Random();
var readBlock = new TransformBlock<int, int>(x => { return DoSomething(x, "readBlock"); },
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 1 }); //1
var braodcastBlock = new BroadcastBlock<int>(i => i); // ⬅️ Here
var processBlock1 =
new TransformBlock<int, int>(x => DoSomething(x, "processBlock1")); //2
var processBlock2 =
new TransformBlock<int, int>(x => DoSomething(x, "processBlock2")); //3
var saveBlock =
new ActionBlock<int>(
x => Save(x)); //4
readBlock.LinkTo(braodcastBlock, new DataflowLinkOptions { PropagateCompletion = true });
braodcastBlock.LinkTo(processBlock1,
new DataflowLinkOptions { PropagateCompletion = true }); //5
braodcastBlock.LinkTo(processBlock2,
new DataflowLinkOptions { PropagateCompletion = true }); //6
processBlock1.LinkTo(
saveBlock); //7
processBlock2.LinkTo(
saveBlock); //8
readBlock.Post(1); //10
readBlock.Post(2); //10
Task.WhenAll(
processBlock1.Completion,
processBlock2.Completion)
.ContinueWith(_ => saveBlock.Complete());
readBlock.Complete(); //12
saveBlock.Completion.Wait(); //13
Console.WriteLine("Processing complete!");
}
// Define other methods and classes here
private static int DoSomething(int i, string method)
{
Console.WriteLine($"Do Something, callng method : { method} {i}");
return i;
}
private static Task<int> DoSomethingAsync(int i, string method)
{
DoSomething(i, method);
return Task.FromResult(i);
}
private static void Save(int i)
{
Console.WriteLine("Save! " + i);
}
It appears that you're posting only one item to the graph, and the first consumer to consume it wins. There's no implied 'tee' functionality in the graph you've made--so there's no possible parallelism there.

Mixing LINQ with async (getting payload from a seed)

I have a collection of seeds
var seeds = new [] {1, 2, 3, 4};
From each seed I want to run an async method that performs some calculation with the seed:
async Task<int> Calculation(int seed);
My goal is to perform a select like this:
var results = from seed in seeds
let calculation = await Calculation(seed)
select new { seed, calculation };
Unfortunately, this syntax isn't allowed using LINQ.
How can I make the "results" variable contain both the seed and the calculation?
(I would appreciate any answer, but specially if it's using System.Reactive's Observable)
Here's an Rx solution:
var seeds = new [] {1, 2, 3, 4};
var results = Observable.ToObservable(seeds)
.SelectMany(async i => new { seed = i, calculation = await Calculation(i)})
.ToEnumerable();
You could do the following using WhenAll static method:
var r= await Task.WhenAll(seeds.Select(async seed =>new {
Seed= seed,
Result = await Calculation(seed)
}
)
);
Change your async function to return both the calculated number and the given seed:
public static async Task<Output> Calculation(int seed)
{
return new Output { Seed = seed, Result = 0 };
}
public class Output
{
public int Seed { get; set; }
public int Result { get; set; }
}
Then use the linq to return a Task[] on which you can WaitAll or WhenAll: (WaitAll vs WhenAll)
var seeds = new[] { 1, 2, 3, 4 };
var tasks = seeds.Select(Calculation);
var results = await Task.WhenAll(tasks);
foreach (var item in results)
Console.WriteLine($"seed: {item.Seed}, result: {item.Result}");

TPL Dataflow: Bounded capacity and waiting for completion

Below I have replicated a real life scenario as a LINQPad script for the sake of simplicity:
var total = 1 * 1000 * 1000;
var cts = new CancellationTokenSource();
var threads = Environment.ProcessorCount;
int capacity = 10;
var edbOptions = new ExecutionDataflowBlockOptions{BoundedCapacity = capacity, CancellationToken = cts.Token, MaxDegreeOfParallelism = threads};
var dbOptions = new DataflowBlockOptions {BoundedCapacity = capacity, CancellationToken = cts.Token};
var gdbOptions = new GroupingDataflowBlockOptions {BoundedCapacity = capacity, CancellationToken = cts.Token};
var dlOptions = new DataflowLinkOptions {PropagateCompletion = true};
var counter1 = 0;
var counter2 = 0;
var delay1 = 10;
var delay2 = 25;
var action1 = new Func<IEnumerable<string>, Task>(async x => {await Task.Delay(delay1); Interlocked.Increment(ref counter1);});
var action2 = new Func<IEnumerable<string>, Task>(async x => {await Task.Delay(delay2); Interlocked.Increment(ref counter2);});
var actionBlock1 = new ActionBlock<IEnumerable<string>>(action1, edbOptions);
var actionBlock2 = new ActionBlock<IEnumerable<string>>(action2, edbOptions);
var batchBlock1 = new BatchBlock<string>(5, gdbOptions);
var batchBlock2 = new BatchBlock<string>(5, gdbOptions);
batchBlock1.LinkTo(actionBlock1, dlOptions);
batchBlock2.LinkTo(actionBlock2, dlOptions);
var bufferBlock1 = new BufferBlock<string>(dbOptions);
var bufferBlock2 = new BufferBlock<string>(dbOptions);
bufferBlock1.LinkTo(batchBlock1, dlOptions);
bufferBlock2.LinkTo(batchBlock2, dlOptions);
var bcBlock = new BroadcastBlock<string>(x => x, dbOptions);
bcBlock.LinkTo(bufferBlock1, dlOptions);
bcBlock.LinkTo(bufferBlock2, dlOptions);
var mainBlock = new TransformBlock<int, string>(x => x.ToString(), edbOptions);
mainBlock.LinkTo(bcBlock, dlOptions);
mainBlock.Dump("Main Block");
bcBlock.Dump("Broadcast Block");
bufferBlock1.Dump("Buffer Block 1");
bufferBlock2.Dump("Buffer Block 2");
actionBlock1.Dump("Action Block 1");
actionBlock2.Dump("Action Block 2");
foreach(var i in Enumerable.Range(1, total))
await mainBlock.SendAsync(i, cts.Token);
mainBlock.Complete();
await Task.WhenAll(actionBlock1.Completion, actionBlock2.Completion);
counter1.Dump("Counter 1");
counter2.Dump("Counter 2");
I have two issues with this code:
Although I limited BoundedCapacity of all appropriate blocks to 10 elements, it seems like I can push all 1,000,000 messages almost at once. Is this expected behavior?
Although the entire network is configured to propagate completion, it seems like all blocks get completed almost immediately after calling mainBlock.Complete(). I expect that both counter1 and counter2 variables to be equal to total. Is there a way to achieve such behavior?
Yes, this is the expected behavior, because of the BroadcastBlock:
Provides a buffer for storing at most one element at time, overwriting each message with the next as it arrives.
This means that if you link BroadcastBlock to blocks with BoundedCapacity, you will lose messages.
To fix that, you could create a custom block that behaves like BroadcastBlock, but guarantees delivery to all targets. But doing that is not trivial, so you might be satisified with a simpler variant (originally from my old answer):
public static ITargetBlock<T> CreateGuaranteedBroadcastBlock<T>(
IEnumerable<ITargetBlock<T>> targets, DataflowBlockOptions options)
{
var targetsList = targets.ToList();
var block = new ActionBlock<T>(
async item =>
{
foreach (var target in targetsList)
{
await target.SendAsync(item);
}
}, new ExecutionDataflowBlockOptions
{
BoundedCapacity = options.BoundedCapacity,
CancellationToken = options.CancellationToken
});
block.Completion.ContinueWith(task =>
{
foreach (var target in targetsList)
{
if (task.Exception != null)
target.Fault(task.Exception);
else
target.Complete();
}
});
return block;
}
Usage in your case would be:
var bcBlock = CreateGuaranteedBroadcastBlock(
new[] { bufferBlock1, bufferBlock2 }, dbOptions);

How can I use Reactive Extensions to throttle Events using a max window size?

Scenario:
I am building a UI application that gets notifcations from a backend service every few milliseconds. Once I get a new notification i want to update the UI as soon as possible.
As I can get lots of notifications within a short amount of time, and as I always only care about the latest event, I use the Throttle() method of the Reactive Extensions framework. This allows me to ignore notification events that are immediately followed by a new notification and so my UI stays responsive.
Problem:
Say I throttle the event stream of notification events to 50ms and the backend sends a notification every 10ms, the Thottle() method will never return an event as it keeps resetting its Sliding Window again and again. Here i need some additional behaviour to specify something like a timeout, so that i can retrieve atleast one event per second or so in case of such a high throughput of events. How can i do this with Reactive Extensions?
As James stated, Observable.Sample will give you the latest value yielded. However, it will do so on a timer, and not in accordance to when the first event in the throttle occurred. More importantly, however, is that if your sample time is high (say ten seconds), and your event fires right after a sample is taken, you won't get that new event for almost ten seconds.
If you need something a little tighter, you'll need to implement your own function. I've taken the liberty of doing so. This code could definitely use some clean up, but I believe it does what you've asked for.
public static class ObservableEx
{
public static IObservable<T> ThrottleMax<T>(this IObservable<T> source, TimeSpan dueTime, TimeSpan maxTime)
{
return source.ThrottleMax(dueTime, maxTime, Scheduler.Default);
}
public static IObservable<T> ThrottleMax<T>(this IObservable<T> source, TimeSpan dueTime, TimeSpan maxTime, IScheduler scheduler)
{
return Observable.Create<T>(o =>
{
var hasValue = false;
T value = default(T);
var maxTimeDisposable = new SerialDisposable();
var dueTimeDisposable = new SerialDisposable();
Action action = () =>
{
if (hasValue)
{
maxTimeDisposable.Disposable = Disposable.Empty;
dueTimeDisposable.Disposable = Disposable.Empty;
o.OnNext(value);
hasValue = false;
}
};
return source.Subscribe(
x =>
{
if (!hasValue)
{
maxTimeDisposable.Disposable = scheduler.Schedule(maxTime, action);
}
hasValue = true;
value = x;
dueTimeDisposable.Disposable = scheduler.Schedule(dueTime, action);
},
o.OnError,
o.OnCompleted
);
});
}
}
And a few tests...
[TestClass]
public class ThrottleMaxTests : ReactiveTest
{
[TestMethod]
public void CanThrottle()
{
var scheduler = new TestScheduler();
var results = scheduler.CreateObserver<int>();
var source = scheduler.CreateColdObservable(
OnNext(100, 1)
);
var dueTime = TimeSpan.FromTicks(100);
var maxTime = TimeSpan.FromTicks(250);
source.ThrottleMax(dueTime, maxTime, scheduler)
.Subscribe(results);
scheduler.AdvanceTo(1000);
results.Messages.AssertEqual(
OnNext(200, 1)
);
}
[TestMethod]
public void CanThrottleWithMaximumInterval()
{
var scheduler = new TestScheduler();
var results = scheduler.CreateObserver<int>();
var source = scheduler.CreateColdObservable(
OnNext(100, 1),
OnNext(175, 2),
OnNext(250, 3),
OnNext(325, 4),
OnNext(400, 5)
);
var dueTime = TimeSpan.FromTicks(100);
var maxTime = TimeSpan.FromTicks(250);
source.ThrottleMax(dueTime, maxTime, scheduler)
.Subscribe(results);
scheduler.AdvanceTo(1000);
results.Messages.AssertEqual(
OnNext(350, 4),
OnNext(500, 5)
);
}
[TestMethod]
public void CanThrottleWithoutMaximumIntervalInterferance()
{
var scheduler = new TestScheduler();
var results = scheduler.CreateObserver<int>();
var source = scheduler.CreateColdObservable(
OnNext(100, 1),
OnNext(325, 2)
);
var dueTime = TimeSpan.FromTicks(100);
var maxTime = TimeSpan.FromTicks(250);
source.ThrottleMax(dueTime, maxTime, scheduler)
.Subscribe(results);
scheduler.AdvanceTo(1000);
results.Messages.AssertEqual(
OnNext(200, 1),
OnNext(425, 2)
);
}
}
Don't use Observable.Throttle, use Observable.Sample like this, where the TimeSpan gives the desired minimum interval between updates:
source.Sample(TimeSpan.FromMilliseconds(50))

Switch async Task to sync task

I have the following code:
Task.Factory.ContinueWhenAll(items.Select(p =>
{
return CreateItem(p);
}).ToArray(), completedTasks => { Console.WriteLine("completed"); });
Is it possible to convert ContinueWhenAll to a synchronous method? I want to switch back between async and sync.
Edit: I should metnion that each of the "tasks" in the continuewhenall method should be executing synchronously.
If you want to leave your existing code intact and have a variable option of executing synchronously you should make these changes:
bool isAsync = false; // some flag to check for async operation
var batch = Task.Factory.ContinueWhenAll(items.Select(p =>
{
return CreateItem(p);
}).ToArray(), completedTasks => { Console.WriteLine("completed"); });
if (!isAsync)
batch.Wait();
This way you can toggle it programmatically instead of by editing your source code. And you can keep the continuation code the same for both methods.
Edit:
Here is a simple pattern for having the same method represented as a synchronous and async version:
public Item CreateItem(string name)
{
return new Item(name);
}
public Task<Item> CreateItemAsync(string name)
{
return Task.Factory.StartNew(() => CreateItem(name));
}
Unless am mistaken this is what you're looking for
Task.WaitAll(tasks);
//continuation code here
i think you can try this.
using TaskContinuationOptions for a simple scenario.
var taskFactory = new TaskFactory(TaskScheduler.Defau
var random = new Random();
var tasks = Enumerable.Range(1, 30).Select(p => {
return taskFactory.StartNew(() => {
var timeout = random.Next(5, p * 50);
Thread.Sleep(timeout / 2);
Console.WriteLine(#" 1: ID = " + p);
return p;
}).ContinueWith(t => {
Console.WriteLine(#"* 2: ID = " + t.Result);
}, TaskContinuationOptions.ExecuteSynchronously);
}).ToArray();
Task.WaitAll(tasks);
or using TPL Dataflow for a complex scenario.
var step2 = new ActionBlock<int>(i => {
Thread.Sleep(i);
Console.WriteLine(#"* 2: ID = " + i);
}, new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = 1,
//MaxMessagesPerTask = 1
});
var random = new Random();
var tasks = Enumerable.Range(1, 50).Select(p => {
return Task.Factory.StartNew(() => {
var timeout = random.Next(5, p * 50);
Thread.Sleep(timeout / 2);
Console.WriteLine(#" 1: ID = " + p);
return p;
}).ContinueWith(t => {
Thread.Sleep(t.Result);
step2.Post(t.Result);
});
}).ToArray();
await Task.WhenAll(tasks).ContinueWith(t => step2.Complete());
await step2.Completion;

Categories

Resources