TPL dataflow blocks inside a WCF duplex - c#

I have a WCF service with a duplex service contract. This service contract has an operation contact that suppose to do long data processing. I am constrained to limit the number of concurrent data processing to let's say max 3. My problem is that after the data processing I need to get back to the same service instance context so I call back my initiator endpoint passing the data processing result. I need to mention that due to various reasons I am constrained to TPL dataflows and WCF duplex.
Here is a demo to what I wrote so far
In a console library I simulate WCF calls
class Program
static void Main(string[] args)
// simulate service calls
Enumerable.Range(0, 5).ToList().ForEach(x =>
new System.Threading.Thread(new ThreadStart(async () =>
var service = new Service();
await service.Inc(x);
Here is what suppose to be the WCF service
// service contract
public class Service
static TransformBlock<Message<int>, Message<int>> transformBlock;
static Service()
transformBlock = new TransformBlock<Message<int>, Message<int>>(x => Inc(x), new ExecutionDataflowBlockOptions
MaxDegreeOfParallelism = 3
static Message<int> Inc(Message<int> input)
return new Message<int> { Token = input.Token, Data = input.Data + 1 };
// operation contract
public async Task Inc(int id)
var token = Guid.NewGuid().ToString();
transformBlock.Post(new Message<int> { Token = token, Data = id });
while (await transformBlock.OutputAvailableAsync())
Message<int> message;
if (transformBlock.TryReceive(m => m.Token == token, out message))
// do further processing using initiator service instance members
// something like Callback.IncResult(m.Data);
public class Message<T>
public string Token { get; set; }
public T Data { get; set; }
The operation contract is not really necessary to be async, but I needed the OutputAvailableAsync notification.
Is this a good approach or is there a better solution for my scenario?
Thanks in advance.

First, I think you shouldn't use the token the way you do. Unique identifiers are useful when communicating between processes. But when you're inside a single process, just use reference equality.
To actually answer your question, I think the (kind of) busy loop is not a good idea.
A simpler solution for asynchronous throttling would be to use SemaphoreSlim. Something like:
static readonly SemaphoreSlim Semaphore = new SemaphoreSlim(3);
// operation contract
public async Task Inc(int id)
await Semaphore.WaitAsync();
var result = id + 1;
// do further processing using initiator service instance members
// something like Callback.IncResult(result);
If you really want to (or have to?) use dataflow, you can use TaskCompletionSource for synchronization between the operation and the block. The operation method would wait on the Task of the TaskCompletionSource and the block would set it when it finished computation for that message:
private static readonly ActionBlock<Message<int>> Block =
new ActionBlock<Message<int>>(
x => Inc(x),
new ExecutionDataflowBlockOptions
MaxDegreeOfParallelism = 3
static void Inc(Message<int> input)
input.TCS.SetResult(input.Data + 1);
// operation contract
public async Task Inc(int id)
var tcs = new TaskCompletionSource<int>();
Block.Post(new Message<int> { TCS = tcs, Data = id });
int result = await tcs.Task;
// do further processing using initiator service instance members
// something like Callback.IncResult(result);


How to have only one thread for fire and forget task in webapi?

I have a need in my webapi (framework .Net 4.7.2) to call Redis (using StackExchange.Redis) in order to delete a key in a fire and forget way and I am making some stress test.
As I am comparing the various way to have the max speed :
I have already test executing the command with the FireAndForget flag,
I have also measured a simple command to Redis by await it.
And I am now searching a way to collect a list of commands received in a window of 15ms and execute them all in one go by pipeling them.
I have first try to use a Task.Run Action to call Redis but the problem that I am observing is that under stress, the memory of my webapi keep climbing.
The memory is full of System.Threading.IThreadPoolWorkItem[] objects with the folowing code :
public ApiResult<int> DeleteFromBasketId([FromBody] int basketId)
var response = new DeleteFromBasketResponse<int>();
var cpt = Interlocked.Increment(ref counter);
Task.Run(async () => {
await db.StringSetAsync($"BASKET_TO_DELETE_{cpt}",cpt.ToString())
return response;
So I think that under stress my api keep enqueing background task in memory and execute them one after the other as fast as it can but less than the request coming in...
So I am searching for a way to have only one long lived background thread running with the webapi, that could capture the commands to send to Redis and execute them by pipeling them.
I was thinking in runnning a background task by implementing IHostedService interface, but it seems that in this case the background task would not share any state with my current http request. So implementing a IhostedService would be handy for a scheduled background task but not in my case, or I do not know how...
Based on StackExchange.Redis documentation you can use CommandFlags.FireAndForget flag:
public ApiResult<int> DeleteFromBasketId([FromBody] int basketId)
var response = new DeleteFromBasketResponse<int>();
var cpt = Interlocked.Increment(ref counter);
db.StringSet($"BASKET_TO_DELETE_{cpt}", cpt.ToString(), flags: CommandFlags.FireAndForget);
return response;
Edit 1: another solution based on comment
You can use pub/sub approach. Something like this should work:
public class MessageBatcher
private readonly IDatabase target;
private readonly BlockingCollection<Action<IDatabaseAsync>> tasks = new();
private Task worker;
public MessageBatcher(IDatabase target) => = target;
public void AddMessage(Action<IDatabaseAsync> task) => tasks.Add(task);
public IDisposable Start(int batchSize)
var cancellationTokenSource = new CancellationTokenSource();
worker = Task.Factory.StartNew(state =>
var count = 0;
var tokenSource = (CancellationTokenSource) state;
var box = new StrongBox<IBatch>(target.CreateBatch());
tokenSource.Token.Register(b => ((StrongBox<IBatch>)b).Value.Execute(), box);
foreach (var task in tasks.GetConsumingEnumerable(tokenSource.Token))
var batch = box.Value;
if (++count == batchSize)
box.Value = target.CreateBatch();
count = 0;
}, cancellationTokenSource, cancellationTokenSource.Token, TaskCreationOptions.LongRunning, TaskScheduler.Current);
return new Disposer(worker, cancellationTokenSource);
private class Disposer : IDisposable
private readonly Task worker;
private readonly CancellationTokenSource tokenSource;
public Disposer(Task worker, CancellationTokenSource tokenSource) => (this.worker, this.tokenSource) = (worker, tokenSource);
public void Dispose()
private readonly MessageBatcher batcher;
ctor(MessageBatcher batcher) // ensure that passed `handler` is singleton and already already started
this.batcher= batcher;
public ApiResult<int> DeleteFromBasketId([FromBody] int basketId)
var response = new DeleteFromBasketResponse<int>();
var cpt = Interlocked.Increment(ref counter);
batcher.AddMessage(db => db.StringSetAsync($"BASKET_TO_DELETE_{cpt}", cpt.ToString(), flags: CommandFlags.FireAndForget));
return response;

NetTcpBinding and async/await WCF blocking

We are creating a shared WCF channel to use with an async operation:
var channelFactory = new ChannelFactory<IWcfService>(new NetTcpBinding {TransferMode = TransferMode.Buffered});
channelFactory.Endpoint.Behaviors.Add(new DispatcherSynchronizationBehavior(true, 25));
var channel = channelFactory.CreateChannel(new EndpointAddress(new Uri("net.tcp://localhost:80/Service").AbsoluteUri + "/Test"));
This calls the following service:
public interface IWcfService
Task<MyClass> DoSomethingAsync();
[ServiceBehavior(ConcurrencyMode = ConcurrencyMode.Multiple, InstanceContextMode = InstanceContextMode.PerCall)]
public class WcfServiceImpl : IWcfService
public Task<MyClass> DoSomethingAsync()
return Task.FromResult(new MyClass());
public class MyClass
public string SomeString { get; set; }
public MyClass Related { get; set; }
public int[] Numbers { get; set; }
If we start 3 requests at once and simulate a long running task on the response:
using ((IDisposable)channel)
var task1 = Task.Run(async () => await DoStuffAsync(channel));
var task2 = Task.Run(async () => await DoStuffAsync(channel));
var task3 = Task.Run(async () => await DoStuffAsync(channel));
Task.WaitAll(task1, task2, task3);
public static async Task DoStuffAsync(IWcfService channel)
await channel.DoSomethingAsync();
// Simulate long running CPU bound operation
Console.WriteLine("Wait completed");
Then all 3 requests reach the server concurrently, it then responds to all 3 requests at the same time.
However once the response reaches the client it processes each in turn.
// 5 second delay
Wait completed
// Instant
// 5 second delay
Wait completed
// Instant
The responses resume on different threads but only runs 1 per time.
If we use streaming instead of buffered we get the expected behaviour, the client processes all 3 responses concurrently.
We have tried setting max buffer size, using DispatcherSynchronizationBehaviour, different concurrency modes, toggling sessions, ConfigureAwait false and calling channel.Open() explicitly.
There seems to be no way to get proper concurrent responses on a shared session.
I have added an image of what I believe to be happening, this only happens in Buffered mode, in streamed mode the main thread does not block.
I was trying to solve exact same problem recently. Although, I wasn't able to identify exactly why TransferMode.Buffered is causing what seems to be a global lock on a WCF channel until the thread that was using it gets released, I've found this similar issue deadlock after awaiting. They suggest a workaround which is to add RunContinuationsAsynchronously() to your awaits i.e. await channel.DoSomethingAsync().RunContinuationsAsynchronously() where RunContinuationsAsynchronously():
public static class TaskExtensions
public static Task<T> RunContinuationsAsynchronously<T>(this Task<T> task)
var tcs = new TaskCompletionSource<T>();
task.ContinueWith((t, o) =>
if (t.IsFaulted)
if (t.Exception != null) tcs.SetException(t.Exception.InnerExceptions);
else if (t.IsCanceled)
}, TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default);
return tcs.Task;
public static Task RunContinuationsAsynchronously(this Task task)
var tcs = new TaskCompletionSource<object>();
task.ContinueWith((t, o) =>
if (t.IsFaulted)
if (t.Exception != null) tcs.SetException(t.Exception.InnerExceptions);
else if (t.IsCanceled)
}, TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default);
return tcs.Task;
Which separates WCF continuations. Apparently Task.Yield() works too.
It would be nice to actually understand why this is happening though.

Aborting WCF proxy that has non-cached ChannelFactory with CancellationToken causes deadlock

So here's how it is.
There is a WCF service. I've generated a proxy for it by "Add Service Reference" with task-based operations.
Endpoint address for that service might change for different users. I have no control over this service, vendor just does that this way.
Then I wrap that service into another type and through that type interaction with WCF service occurs.
It all looks like this:
//Generated code
public partial class MyServiceClient: System.ServiceModel.ClientBase<IMyService>, IMyService
public async Task<ResultDataContractType> GetResultsAsync(ArgsDataContractType args)
return base.Channel.GetResultsAsync(args);
//end of generated code
public class ClientFactory
public static IMyService CreateClient(string url)
var binding = new BasicHttpBinding();
var address = new EndpointAddress(url);
var client = new MyServiceClient(binding, address);
return client;
public class Wrapper()
public async Task<ResultType> GetResultsAsync(string url, ArgsType args, CancelationToke cancelationToken)
var client = ClientFactory.CreateClient(url);
cancellationToken.Register(target =>
var communicationObject = target as ICommunicationObject;
if (communicationObject != null)
}, client);
ArgsDataContractType requestArgs = MapArgs(args);
ResultDataContractType result = await client.GetResultsAsync(args);
public class Consumer
public async void DoWork()
var args = new ArgsType
var cts = new CancellationTokenSource()
var wrapper = new Wrapper();
Task<ResultType> task = wrapper.GetResultsAsync("", args, cts.Token);
cts.Cancel(); //This is made intentionaly, normaly there is timeout timespan for token source
await task;
Consumer is actually the NUnit unit test, but calling the same code from ASP.NET application would also end up in a deadlock. It gets stuck on await task;
What I have noticed, that if I would set MyServiceClient.CacheSetting = CacheSetting.AlwaysOn; will make that code run without deadlocking.
Also, if I would configure MyServiceClient from App.config or Web.config will make code running without deadlocking. But if I would set MyServiceClient.CacheSetting = CacheSetting.AlwaysOff; before instantiating MyServiceClient this code will deadlock.
Also, configuring awaiter like this:
ResultDataContractType result = await client.GetResultsAsync(args).ConfigureAwait(false)
Will make code run without deadlocking.
Could you please enlighten me with any idea why's that deadlock doesn't happens when ChannelFactory for MyServiceClient is cached, and will happen if it is not cached?

Async TPL deadlock with third party lib aka wild goose chase

After spending a very frustrating and unproductive day on this, I'm posting here in search of help.
I am using a third-party library that initiates a network connection in an unknown manner (I do know however it's a managed wrapper for an unmanaged lib). It lets you know about the status of the connection by invoking an event StatusChanged(status).
Since obviously invoking the network is costly and I may not need it for my Service, I inject an AsyncLazy<Connection> which is then invoked if necessary. The Service is accessed by ParallelForEachAsync which is an extension I made to process Tasks concurrently, based on this post.
If accessed sequentially, all is well. Any concurrency, even 2 parallel tasks will result in a deadlock 90% of the time. I know it's definitely related to how the third-party lib interacts with my code because a) I am not able to reproduce the effect using the same structure but without invoking it and b) the event StatusChanged(Connecting) is received fine, at which point I assume the network operation is started and I never get a callback for StatusChanged(Connected).
Here's a as-faithful-as-possible repro of the code structure which doesn't reproduce the deadlock unfortunately.
Any ideas on how to go about resolving this?
class Program
static void Main(string[] args)
AsyncContext.Run(() => MainAsync(args));
static async Task MainAsync(string[] args)
var lazy = new AsyncLazy<Connection>(() => ConnectionFactory.Create());
var service = new Service(lazy);
await Enumerable.Range(0, 100)
.ParallelForEachAsync(10, async i =>
await service.DoWork();
Console.WriteLine("did some work");
}, CancellationToken.None);
class ConnectionFactory
public static Task<Connection> Create()
var tcs = new TaskCompletionSource<Connection>();
var session = new Session();
session.Connected += (sender, args) =>
tcs.SetResult(new Connection());
return tcs.Task;
class Connection
public async Task DoSomethinElse()
await Task.Delay(1000);
class Session
public event EventHandler Connected;
public void Connect()
Console.WriteLine("Simulate network operation with unknown scheduling");
Connected(this, EventArgs.Empty);
class Service
private static Random r = new Random();
private readonly AsyncLazy<Connection> lazy;
public Service(AsyncLazy<Connection> lazy)
this.lazy = lazy;
public async Task DoWork()
Console.WriteLine("Trying to do some work, will connect");
await Task.Delay(r.Next(0, 100));
var connection = await lazy;
await connection.DoSomethinElse();
public static class AsyncExtensions
public static async Task<AsyncParallelLoopResult> ParallelForEachAsync<T>(
this IEnumerable<T> source,
int degreeOfParallelism,
Func<T, Task> body,
CancellationToken cancellationToken)
var partitions = Partitioner.Create(source).GetPartitions(degreeOfParallelism);
bool wasBroken = false;
var tasks =
from partition in partitions
select Task.Run(async () =>
using (partition)
while (partition.MoveNext())
if (cancellationToken.IsCancellationRequested)
Volatile.Write(ref wasBroken, true);
await body(partition.Current);
await Task.WhenAll(tasks)
return new AsyncParallelLoopResult(Volatile.Read(ref wasBroken));
public class AsyncParallelLoopResult
public bool IsCompleted { get; private set; }
internal AsyncParallelLoopResult(bool isCompleted)
IsCompleted = isCompleted;
I think I understand why it's happening but not sure how to solve it. While the context is waiting for DoWork, DoWork is waiting for the lazy connection.
This ugly hack seems to solve it:
Connection WaitForConnection()
var awaiter = connectionLazy.GetAwaiter();
while (!awaiter.IsCompleted)
return awaiter.GetResult();
Any more elegant solutions?
I suspect that the 3rd-party library is requiring some kind of STA pumping. This is fairly common with old-style asynchronous code.
I have a type AsyncContextThread that you can try, passing true to the constructor to enable manual STA pumping. AsyncContextThread is just like AsyncContext except it runs the context within a new thread (an STA thread in this case).
static void Main(string[] args)
using (var thread = new AsyncContextThread(true))
thread.Factory.Run(() => MainAsync(args)).Wait();
static void Main(string[] args)
AsyncContext.Run(() => async
using (var thread = new AsyncContextThread(true))
await thread.Factory.Run(() => MainAsync(args));
Note that AsyncContextThread will not work in all STA scenarios. I have run into issues when doing (some rather twisted) COM interop that required a true UI thread (WPF or WinForms thread); for some reason the STA pumping wasn't sufficient for those COM objects.

Accepting incoming requests asynchronously

I have identified a bottleneck in my TCP application that I have simplified for the sake of this question.
I have a MyClient class, that represents when a client connects; also I have a MyWrapper class, that represents a client that fulfill some conditions. If a MyClientfulfill some conditions, it qualifies for wrapper.
I want to expose an method that allows the caller to await a MyWrapper, and that method should handle the negotiation and rejection of invalid MyClients:
public static async Task StartAccepting(CancellationToken token)
while (!token.IsCancellationRequested)
var wrapper = await AcceptWrapperAsync(token);
Therefore AcceptWrapperAsync awaits a valid wrapper, and HandleWrapperAsync handles the wrapper asynchronously without blocking the thread, so AcceptWrapperAsync can get back to work as fast as it can.
How that method works internally is something like this:
public static async Task<MyWrapper> AcceptWrapperAsync(CancellationToken token)
while (!token.IsCancellationRequested)
var client = await AcceptClientAsync();
if (IsClientWrappable(client))
return new MyWrapper(client);
return null;
public static async Task<MyClient> AcceptClientAsync()
await Task.Delay(1000);
return new MyClient();
private static Boolean IsClientWrappable(MyClient client)
return true;
This code simulates that there is a client connection every second, and that it takes half a second to checkout if the connection is suitable for a wrapper. AcceptWrapperAsync loops till a valid wrapper is generated, and then returns.
This approach, that works well, has a flaw. During the time that IsClientWrappable is executing, no further clients can be accepted, creating a bottleneck when lot of clients are trying to connect at the same time. I am afraid that in real life, if the server goes down while having lot of clients connected, the going up is not gonna be nice because all of them will try to connect at the same time. I know that is very difficult to connect all of them at the same time, but I would like to speed up the connection process.
Making IsClientWrappable async, would just ensure that the executing thread is not blocked till the negotiation finishes, but the execution flow is blocked anyway.
How could I improve this approach to continuously accept new clients but still be able of awaiting a wrapper using AcceptWrapperAsync?
//this loop must never be blocked
while (!token.IsCancellationRequested)
var client = await AcceptClientAsync();
HandleClientAsync(client); //must not block
Task HandleClientAsync(Client client) {
if (await IsClientWrappableAsync(client)) //make async as well, don't block
await HandleWrapperAsync(new MyWrapper(client));
This way you move the IsClientWrappable logic out of the accept loop and into the background async workflow.
If you do not wish to make IsClientWrappable non-blocking, just wrap it with Task.Run. It is essential that HandleClientAsync does not block so that its caller doesn't either.
TPL Dataflow to the rescue. I have created a "producer/consumer" object with two queues that:
accepts inputs from "producer" and stores it in the "in" queue.
a internal asynchronous task read from the "in" queue and process the input in parallel with a given maximum degree of parallelism.
put the processed item in the "out" queue afterwards. Result or Exception.
accepts a consumer to await an item. Then can check if the processing was successful or not.
I have done some testing and it seems to work fine, I want to do more testing though:
public sealed class ProcessingResult<TOut>
where TOut : class
public TOut Result { get; internal set; }
public Exception Error { get; internal set; }
public abstract class ProcessingBufferBlock<TIn,TOut>
where TIn:class
where TOut:class
readonly BufferBlock<TIn> _in;
readonly BufferBlock<ProcessingResult<TOut>> _out;
readonly CancellationToken _cancellation;
readonly SemaphoreSlim _semaphore;
public ProcessingBufferBlock(Int32 boundedCapacity, Int32 degreeOfParalellism, CancellationToken cancellation)
_cancellation = cancellation;
_semaphore = new SemaphoreSlim(degreeOfParalellism);
var options = new DataflowBlockOptions() { BoundedCapacity = boundedCapacity, CancellationToken = cancellation };
_in = new BufferBlock<TIn>(options);
_out = new BufferBlock<ProcessingResult<TOut>>(options);
private async Task StartReadingAsync()
await Task.Yield();
while (!_cancellation.IsCancellationRequested)
var incoming = await _in.ReceiveAsync(_cancellation);
private async Task ProcessThroughGateAsync(TIn input)
Exception error=null;
TOut result=null;
result = await ProcessAsync(input);
catch (Exception ex)
error = ex;
if(result!=null || error!=null)
_out.Post(new ProcessingResult<TOut>() { Error = error, Result = result });
protected abstract Task<TOut> ProcessAsync(TIn input);
public void Post(TIn item)
public Task<ProcessingResult<TOut>> ReceiveAsync()
return _out.ReceiveAsync();
So the example I used on the OP would be something like this:
public class WrapperProcessingQueue : ProcessingBufferBlock<MyClient, MyWrapper>
public WrapperProcessingQueue(Int32 boundedCapacity, Int32 degreeOfParalellism, CancellationToken cancellation)
: base(boundedCapacity, degreeOfParalellism, cancellation)
{ }
protected override async Task<MyWrapper> ProcessAsync(MyClient input)
await Task.Delay(5000);
if (input.Id % 3 == 0)
return null;
return new MyWrapper(input);
And then I could add MyClient objects to that queue as fast as I get them, they would be processed in parallel, and the consumer would await for the ones that pass the filter.
As I said, I want to do more testing but any feedback will be very welcomed.

