For some reason, it appears code inside of the Consumer nor Producer Tasks is ever executed. Where am I going wrong?
using System.Threading.Channels;
namespace TEST.CHANNELS
{
public class Program
{
public static async Task Main(string[] args)
{
var channel = Channel.CreateUnbounded<int>();
var cancel = new CancellationToken();
await Consumer(channel, cancel);
await Producer(channel, cancel);
Console.ReadKey();
}
private static async Task Producer(Channel<int, int> ch, CancellationToken cancellationToken)
{
for (int i = 0; i < 59; i++)
{
await Task.Delay(1000, cancellationToken);
await ch.Writer.WriteAsync(i, cancellationToken);
}
}
private static async Task Consumer(Channel<int, int> ch, CancellationToken cancellationToken)
{
await foreach (var item in ch.Reader.ReadAllAsync(cancellationToken))
{
Console.WriteLine(item);
}
}
}
}
If you are new, I recommend reading Tutorial: Learn to debug C# code using Visual Studio. You should know how to put breakpoints to see your code running step-by-step.
Now however since this one involves async/Task, it may looks confusing, but when you step in Consumer, you will see it stops at await foreach (var item in ch.Reader.ReadAllAsync(cancellationToken)) line.
The reason is the consumer is waiting for something that producer never puts in. The reason is your first await put a stop to your code so the 2nd line never get executed.
await Consumer(channel, cancel);
await Producer(channel, cancel);
This should fix the issue:
var consumerTask = Consumer(channel, cancel);
var producerTask = Producer(channel, cancel);
await Task.WhenAll(consumerTask, producerTask);
What the above code says is,
Run Consumer Task, don't wait for it, but keep track of it in consumerTask.
Run Producer Task, don't wait for it, but keep track of it in producerTask.
Wait until both consumerTask and producerTask finishes using Task.WhenAll.
Note that it seems you still have a logical problem with Consumer, since it will never exit so your ReadKey() will likely not getting hit (your app would stuck at the WhenAll line). I think it's much easier "practice" for you if you intend to fix it if it's a bug.
Your code is trying to consume all messages in the channel before any are produced. While you can store the producer/consumer tasks instead of awaiting them, it's better to use idioms and patterns specific to channels.
Instead of using a Channel as some kind of container, only expose and share Readers to a channel created and owned by a consumer. That's how Channels are used in Go.
That's why you can only work with a ChannelReader and a ChannelWriter too:
a ChannelReader is a ch -> in Go, the only way to read from a channel
a ChannelWriter is a ch <- in Go, the only way to write.
Using Owned channels
If you need to process data asynchronously, do this in a task inside the producer/consumer methods. This makes it a lot easier to control the channels and know when processing is finished or cancelled. It also allows you to construct pipelines from channels quite easily.
In your case, the producer could be :
public ChannelReader<int> Producer(CancellationToken cancellationToken)
{
var channel=Channel.CreateUnbounded<int>();
var writer=channel.Writer;
_ = Task.Run(()=>{
for (int i = 0; i < 59; i++)
{
await Task.Delay(1000, cancellationToken);
await writer.WriteAsync(i, cancellationToken);
}
},cancellationToken)
.ContinueWith(t=>writer.TryComplete(t.Exception));
return channel;
}
The consumer, if one is lazy, can be :
static async Task ConsumeNumbers(this ChannelReader<int> reader, CancellationToken cancellationToken)
{
await foreach (var item in reader.ReadAllAsync(cancellationToken))
{
Console.WriteLine(item);
}
}
Making this an extension method Both can be combined with :
await Producer(cancel)
.ConsumeNumbers(cancel);
In the more generic case, a pipeline block reads from a channel and returns a channel :
public ChannelReader<int> RaiseTo(this ChannelReader<int> reader, double pow,CancellationToken cancellationToken)
{
var channel=Channel.CreateUnbounded<int>();
var writer=channel.Writer;
_ = Task.Run(async ()=>{
await foreach (var item in reader.ReadAllAsync(cancellationToken))
{
var newItem=Math.Pow(item,pow);
await writer.WriteAsync(newItem);
}
},cancellationToken)
.ContinueWith(t=>writer.TryComplete(t.Exception));
return channel;
}
This would allow creating a pipeline of steps, eg :
await Producer(cancel)
.RaiseTo(0.3,cancel)
.RaiseTo(3,cancel)
.ConsumeNumbers(cancel);
Parallel processing
It's also possible to use multiple tasks per block, to speed up processing. In .NET 6 this can be done easily with Parallel.ForEachAsync :
public ChannelReader<int> RaiseTo(this ChannelReader<int> reader, double pow,CancellationToken cancellationToken)
{
var channel=Channel.CreateUnbounded<int>();
var writer=channel.Writer;
_ = Parallel.ForEachAsync(
reader.ReadAllAsync(cancellationToken),
cancellationToken,
async item=>
{
var newItem=Math.Pow(item,pow);
await writer.WriteAsync(newItem);
})
.ContinueWith(t=>writer.TryComplete(t.Exception));
return channel;
}
Beware the order
A Channel preserves the order of items and read requests. This means that a single-task step will always consume and produce messages in order. There's no such guarantee with Parallel.ForEachAsync though. If order is important you'd have to add code to ensure messages are emitted in order, or try to reorder them with another step.
Related
We are using BlockCollection to implement producer-consumer pattern in a real-time application, i.e.
BlockingCollection<T> collection = new BlockingCollection<T>();
CancellationTokenSource cancellationTokenSource = new CancellationTokenSource();
// Starting up consumer
Task.Run(() => consumer(this.cancellationTokenSource.Token));
…
void Producer(T item)
{
collection.Add(item);
}
…
void consumer()
{
while (true)
{
var item = this.blockingCollection.Take(token);
process (item);
}
}
To be sure, this is a very simplified version of the actual production code.
Sometimes when the application is under heavy load, we observe that the consuming part is lagging behind the producing part. Since the application logic is very complex, it involves interaction with other applications over network, as well as with SQL databases. Delays could be occurring in many places; they could occur in the calls to process(), which might in principle explain why the consuming part can be slow.
All the above considerations aside, is there something inherent in using BlockingCollection, which could explain this phenomenon? Are there more efficient options in .Net to realise producer-consumer pattern?
First of all, BlockingCollection isn't the best choice for producer/consumer scenarios. There are at least two better options (Dataflow, Channels) and the choice depends on the actual application scenario - which is missing from the question.
It's also possible to create a producer/consumer pipeline without a buffer, by using async streams and IAsyncEnmerable.
Async Streams
In this case, the producer can be an async iterator. The consumer will receive the IAsyncEnumerable and iterate over it until it completes. It could also produce its own IAsyncEnumerable output, which can be passed to the next method in the pipeline:
The producer can be :
public static async IAsyncEnumerable<Message> ProducerAsync(CancellationToken token)
{
while(!token.IsCancellationRequested)
{
var msg=await Task.Run(()=>SomeHeavyWork());
yield return msg;
}
}
And the consumer :
async Task ConsumeAsync(IAsyncEnumerable<Message> source)
{
await foreach(var msg in source)
{
await consumeMessage(msg);
}
}
There's no buffering in this case, and the producer can't emit a new message until the consumer consumes the current one. The consumer can be parallelized with Parallel.ForEachAsync. Finally, the System.Linq.Async provides LINQ operations to async streams, allowing us to write eg :
List<OtherMsg> results=await ProducerAsync(cts.Token)
.Select(msg=>consumeAndReturn(msg))
.ToListAsync();
Dataflow - ActionBlock
Dataflow blocks can be used to construct entire processing pipelines, with each block receiving a message (data) from the previous one, processing it and passing it to the next block. Most blocks have input and where appropriate output buffers. Each block uses a single worker task but can be configured to use more. The application code doesn't have to handle the tasks though.
In the simplest case, a single ActionBlock can process messages posted to it by one or more producers, acting as a consumer:
async Task ConsumeAsync<Message>(Message message)
{
//Do something with the message
}
...
ExecutionDataflowBlockOptions _options= new () {
MaxDegreeOfParallelism=4,
BoundedCapacity=5
};
ActionBlock<Message> _block=new ActionBlock(ConsumeAsync,_options);
async Task ProduceAsync(CancellationToken token)
{
while(!token.IsCancellationRequested)
{
var msg=await produceNewMessageAsync();
await _block.SendAsync(msg);
}
_block.Complete();
await _block.Completion;
}
In this example the block uses 4 worker tasks and will block if more than 5 items are waiting in its input buffer, beyond those currently being processed.
BufferBlock as a producer/consumer queue
A BufferBlock is an inactive block that's used as a buffer by other blocks. It can be used as an asynchronous producer/consumer collection as shown in How to: Implement a producer-consumer dataflow pattern. In this case, the code needs to receive messages explicitly. Threading is up to the developer. :
static void Produce(ITargetBlock<byte[]> target)
{
var rand = new Random();
for (int i = 0; i < 100; ++ i)
{
var buffer = new byte[1024];
rand.NextBytes(buffer);
target.Post(buffer);
}
target.Complete();
}
static async Task<int> ConsumeAsync(ISourceBlock<byte[]> source)
{
int bytesProcessed = 0;
while (await source.OutputAvailableAsync())
{
byte[] data = await source.ReceiveAsync();
bytesProcessed += data.Length;
}
return bytesProcessed;
}
static async Task Main()
{
var buffer = new BufferBlock<byte[]>();
var consumerTask = ConsumeAsync(buffer);
Produce(buffer);
var bytesProcessed = await consumerTask;
Console.WriteLine($"Processed {bytesProcessed:#,#} bytes.");
}
Parallelized consumer
In .NET 6 the consumer can be simplified by using await foreach and ReceiveAllAsync :
static async Task<int> ConsumeAsync(IReceivableSourceBlock<byte[]> source)
{
int bytesProcessed = 0;
await foreach(var data in source.ReceiveAllAsync())
{
bytesProcessed += data.Length;
}
return bytesProcessed;
}
And processed concurrently using Parallel.ForEachAsync :
static async Task ConsumeAsync(IReceivableSourceBlock<byte[]> source)
{
var msgs=source.ReceiveAllAsync();
await Parallel.ForEachAsync(msgs,
new ParallelOptions { MaxDegreeOfParallelism = 4},
msg=>ConsumeMsgAsync(msg));
}
By default Parallel.ForeachAsync will use as many worker tasks as there are cores
Channels
Channels are similar to Go's channels. They are built specifically for producer/consumer scenarios and allow creating pipelines at a lower level than the Dataflow library. If the Dataflow library was built today, it would be built on top of Channels.
A channel can't be accessed directly, only through its Reader or Writer interfaces. This is intentional, and allows easy pipelining of methods. A very common pattern is for a producer method to create an channel it owns and return only a ChannelReader. Consuming methods accept that reader as input. This way, the producer can control the channel's lifetime without worrying whether other producers are writing to it.
With channels, a producer would look like this :
ChannelReader<Message> Producer(CancellationToken token)
{
var channel=Channel.CreateBounded(5);
var writer=channel.Writer;
_ = Task.Run(()=>{
while(!token.IsCancellationRequested)
{
...
await writer.SendAsync(msg);
}
},token)
.ContinueWith(t=>writer.TryComplete(t.Exception));
return channel.Reader;
}
The unusual .ContinueWith(t=>writer.TryComplete(t.Exception)); is used to signal completion to the writer. This will signal readers to complete as well. This way completion propagates from one method to the next. Any exceptions are propagated as well
writer.TryComplete(t.Exception)) doesn't block or perform any significant work so it doesn't matter what thread it executes on. This means there's no need to use await on the worker task, which would complicate the code by rethrowing any exceptions.
A consuming method only needs the ChannelReader as source.
async Task ConsumerAsync(ChannelReader<Message> source)
{
await Parallel.ForEachAsync(source.ReadAllAsync(),
new ParallelOptions { MaxDegreeOfParallelism = 4},
msg=>consumeMessageAsync(msg)
);
}
A method may read from one channel and publish new data to another using the producer pattern :
ChannelReader<OtherMessage> ConsumerAsync(ChannelReader<Message> source)
{
var channel=Channel.CreateBounded<OtherMessage>();
var writer=channel.Writer;
await Parallel.ForEachAsync(source.ReadAllAsync(),
new ParallelOptions { MaxDegreeOfParallelism = 4},
async msg=>{
var newMsg=await consumeMessageAsync(msg);
await writer.SendAsync(newMsg);
})
.ContinueWith(t=>writer.TryComplete(t.Exception));
}
You could look at using the Dataflow library. I'm not sure if it is more performant than a BlockingCollection. As others have said, there is no guarantee that you can consume faster than produce, so it is always possible to fall behind.
I'm currently reading in data via a SerialPort connection in an asynchronous Task in a console application that will theoretically run forever (always picking up new serial data as it comes in).
I have a separate Task that is responsible for pulling that serial data out of a HashSet type that gets populated from my "producer" task above and then it makes an API request with it. Since the "producer" will run forever, I need the "consumer" task to run forever as well to process it.
Here's a contrived example:
TagItems = new HashSet<Tag>();
Sem = new SemaphoreSlim(1, 1);
SerialPort = new SerialPort("COM3", 115200, Parity.None, 8, StopBits.One);
// serialport settings...
try
{
var producer = StartProducerAsync(cancellationToken);
var consumer = StartConsumerAsync(cancellationToken);
await producer; // this feels weird
await consumer; // this feels weird
}
catch (Exception e)
{
Console.WriteLine(e); // when I manually throw an error in the consumer, this never triggers for some reason
}
Here's the producer / consumer methods:
private async Task StartProducerAsync(CancellationToken cancellationToken)
{
using var reader = new StreamReader(SerialPort.BaseStream);
while (SerialPort.IsOpen)
{
var readData = await reader.ReadLineAsync()
.WaitAsync(cancellationToken)
.ConfigureAwait(false);
var tag = new Tag {Data = readData};
await Sem.WaitAsync(cancellationToken);
TagItems.Add(tag);
Sem.Release();
await Task.Delay(100, cancellationToken);
}
reader.Close();
}
private async Task StartConsumerAsync(CancellationToken cancellationToken)
{
while (!cancellationToken.IsCancellationRequested)
{
await Sem.WaitAsync(cancellationToken);
if (TagItems.Any())
{
foreach (var item in TagItems)
{
await SendTagAsync(tag, cancellationToken);
}
}
Sem.Release();
await Task.Delay(1000, cancellationToken);
}
}
I think there are multiple problems with my solution but I'm not quite sure how to make it better. For instance, I want my "data" to be unique so I'm using a HashSet, but that data type isn't concurrent-friendly so I'm having to lock with a SemaphoreSlim which I'm guessing could present performance issues with large amounts of data flowing through.
I'm also not sure why my catch block never triggers when an exception is thrown in my StartConsumerAsync method.
Finally, are there better / more modern patterns I can be using to solve this same problem in a better way? I noticed that Channels might be an option but a lot of producer/consumer examples I've seen start with a producer having a fixed number of items that it has to "produce", whereas in my example the producer needs to stay alive forever and potentially produces infinitely.
First things first, starting multiple asynchronous operations and awaiting them one by one is wrong:
// Wrong
await producer;
await consumer;
The reason is that if the first operation fails, the second operation will become fire-and-forget. And allowing tasks to escape your supervision and continue running unattended, can only contribute to your program's instability. Nothing good can come out from that.
// Correct
await Task.WhenAll(producer, consumer)
Now regarding your main issue, which is how to make sure that a failure in one task will cause the timely completion of the other task. My suggestion is to hook the failure of each task with the cancellation of a CancellationTokenSource. In addition, both tasks should watch the associated CancellationToken, and complete cooperatively as soon as possible after they receive a cancellation signal.
var cts = new CancellationTokenSource();
Task producer = StartProducerAsync(cts.Token).OnErrorCancel(cts);
Task consumer = StartConsumerAsync(cts.Token).OnErrorCancel(cts);
await Task.WhenAll(producer, consumer)
Here is the OnErrorCancel extension method:
public static Task OnErrorCancel(this Task task, CancellationTokenSource cts)
{
return task.ContinueWith(t =>
{
if (t.IsFaulted) cts.Cancel();
return t;
}, default, TaskContinuationOptions.DenyChildAttach, TaskScheduler.Default).Unwrap();
}
Instead of doing this, you can also just add an all-enclosing try/catch block inside each task, and call cts.Cancel() in the catch.
I have the below code:
var channel = Channel.CreateUnbounded<string>();
var consumers = Enumerable
.Range(1, 5)
.Select(consumerNumber =>
Task.Run(async () =>
{
var rnd = new Random();
while (await channel.Reader.WaitToReadAsync())
{
if (channel.Reader.TryRead(out var item))
{
Console.WriteLine($"Consuming {item} on consumer {consumerNumber}");
}
}
}));
var producers = Enumerable
.Range(1, 5)
.Select(producerNumber =>
Task.Run(async () =>
{
var rnd = new Random();
for (var i = 0; i < 10; i++)
{
var t = $"Message {i}";
Console.WriteLine($"Producing {t} on producer {producerNumber}");
await channel.Writer.WriteAsync(t);
await Task.Delay(TimeSpan.FromSeconds(rnd.Next(3)));
}
}));
await Task.WhenAll(producers)
.ContinueWith(_ => channel.Writer.Complete());
await Task.WhenAll(consumers);
Which works as it should however im wanting it to consume at the same time as producing. However
await Task.WhenAll(producers)
.ContinueWith(_ => channel.Writer.Complete());
Blocks the consumer from running until its complete and I can't think of a way of getting them both to run?
There are a couple of issues with the code, including forgetting to enumate the producers and consumers enumerables. IEnumerable is evaluated lazily, so until you actually enumerate it with eg foreach or ToList, nothing is generated.
There's nothing wrong with ContinueWith when used properly either. It's definitely better and cheaper than using exceptions as control flow.
The code can be improved a lot by using some common Channel coding patterns.
The producer owns and encapsulates the channel
The producer exposes only Reader(s)
Plus, ContinueWith is an excellent choice to signal a ChannelWriter's completion, as we don't care at all which thread will do that. If anything, we'd prefer to use one of the "worker" threads to avoid a thread switch.
Let's say the producer function is:
async Task Produce(ChannelWriter<string> writer, int producerNumber)
{
return Task.Run(async () =>
{
var rnd = new Random();
for (var i = 0; i < 10; i++)
{
var t = $"Message {i}";
Console.WriteLine($"Producing {t} on producer {producerNumber}");
await channel.Writer.WriteAsync(t);
await Task.Delay(TimeSpan.FromSeconds(rnd.Next(3)));
}
}
}
Producer
The producer can be :
ChannelReader<string> ProduceData(int dop)
{
var channel=Channel.CreateUnbounded<string>();
var writer=channel.Writer;
var tasks=Enumerable.Range(0,dop)
.Select(producerNumber => Produce(producerNumber))
.ToList();
_ =Task.WhenAll(tasks).ContinueWith(t=>writer.TryComplete(t.Exception));
.
return channel.Reader;
}
Completion and error propagation
Notice the line :
_ =Task.WhenAll(tasks).ContinueWith(t=>writer.TryComplete(t.Exception));
This says that as soon as the producers complete, the writer itself should complete with any exception that may be raised. It doesn't really matter what thread the continuation runs on as it doesn't do anything other than call TryComplete.
More importantly, t=>writer.TryComplete(t.Exception) propagates the worker exception(s) to downstream consumers. Otherwise the consumers would never know something went wrong. If you had a database consumer you'd want it to avoid finalizing any changes if the source aborted.
Consumer
The consumer method can be:
async Task Consume(ChannelReader<string> reader,int dop,CancellationToken token=default)
{
var tasks= Enumerable
.Range(1, dop)
.Select(consumerNumber =>
Task.Run(async () =>
{
await foreach(var item in reader.ReadAllAsync(token))
{
Console.WriteLine($"Consuming {item} on consumer {consumerNumber}");
}
}));
await Task.WhenAll(tasks);
}
In this case await Task.WhenAll(tasks); enumerates the worker tasks thus starting them.
Nothing else is needed to produce all generated messages. When all producers finish, the Channel.Reader is completed. When that happens, ReadAllAsync will keep offering all remaining messages to the consumers and exit.
Composition
Combining both methods is as easy as:
var reader=Produce(10);
await Consume(reader);
General Pattern
This is a general pattern for pipeline stages using Channels - read the input from a ChannelReader, write it to an internal Channel and return only the owned channel's Reader. This way the stage owns the channel which makes completion and error handling a lot easier:
static ChannelReader<TOut> Crunch<Tin,TOut>(this ChannelReader<Tin>,int dop=1,CancellationToken token=default)
{
var channel=Channel.CreateUnbounded<TOut>();
var writer=channel.Writer;
var tasks=Enumerable.Range(0,dop)
.Select(async i=>Task.Run(async ()=>
{
await(var item in reader.ReadAllAsync(token))
{
try
{
...
await writer.WriteAsync(msg);
}
catch(Exception exc)
{
//Handle the exception and keep processing messages
}
}
},token));
_ =Task.WhenAll(tasks)
.ContinueWith(t=>writer.TryComplete(t.Exception));
return channel.Reader;
}
This allows chaining multiple "stages" together to form a pipeline:
var finalReader=Producer(...)
.Crunch1()
.Crunch2(10)
.Crunch3();
await foreach(var result in finalReader.ReadAllAsync())
{
...
}
Producer and consumer methods can be written in the same way, allowing, eg the creation of a data import pipeline:
var importTask = ReadFiles<string>(somePath)
.ParseCsv<string,Record[]>(10)
.ImportToDb<Record>(connectionString);
await importTask;
With ReadFiles
static ChannelReader<string> ReadFiles(string folder)
{
var channel=Channel.CreateUnbounded<string>();
var writer=channel.Writer;
var task=Task.Run(async ()=>{
foreach(var path in Directory.EnumerateFiles(folder,"*.csv"))
{
await writer.WriteAsync(path);
}
});
task.ContinueWith(t=>writer.TryComplete(t.Exception));
return channel.Reader;
}
Update for .NET 6 Parallel.ForEachAsync
Now that .NET 6 is supported in production, one could use Parallel.ForEachAsync to simplify a concurrent consumer to :
static ChannelReader<TOut> Crunch<Tin,TOut>(this ChannelReader<Tin>,
int dop=1,CancellationToken token=default)
{
var channel=Channel.CreateUnbounded<TOut>();
var writer=channel.Writer;
var dop=new ParallelOptions {
MaxDegreeOfParallelism = dop,
CancellationToken = token
};
var task=Parallel.ForEachAsync(
reader.ReadAllAsync(token),
dop,
async item =>{
try
{
...
await writer.WriteAsync(msg);
}
catch(Exception exc)
{
//Handle the exception and keep processing messages
}
});
task.ContinueWith(t=>writer.TryComplete(t.Exception));
return channel.Reader;
}
The consumers and producers variables are of type IEnumerable<Task>. This a deferred enumerable, that needs to be materialized in order for the tasks to be created. You can materialize the enumerable by chaining the ToArray operator on the LINQ queries. By doing so, the type of the two variables will become Task[], which means that your tasks are instantiated and up and running.
As a side note, the ContinueWith method requires passing explicitly the TaskScheduler.Default as an argument, otherwise you are at the mercy of whatever the TaskScheduler.Current may be (it might be the UI TaskScheduler for example). This is the correct usage of ContinueWith:
await Task.WhenAll(producers)
.ContinueWith(_ => channel.Writer.Complete(), TaskScheduler.Default);
Code analyzer CA2008: Do not create tasks without passing a TaskScheduler
"[...] This is why in production library code I write, I always explicitly specify the scheduler I want to use." (Stephen Toub)
Another problem is that any exceptions thrown by the producers will be swallowed, because the tasks are not awaited. Only the continuation is awaited, which is unlikely to fail. To solve this problem, you could just ditch the primitive ContinueWith, and instead use async-await composition (an async local function that awaits the producers and then completes the channel). In this case not even that is necessary. You could simply do this:
try { await Task.WhenAll(producers); }
finally { channel.Writer.Complete(); }
The channel will Complete after any outcome of the Task.WhenAll(producers) task, and so the consumers will not get stuck.
A third problem is that a failure of some of the producers will cause the immediate termination of the current method, before awaiting the consumers. These tasks will then become fire-and-forget tasks. I am leaving it to you to find how you can ensure that all tasks can be awaited, in all cases, before exiting the method either successfully or with an error.
I have 2 methods: the first sends HTTP GET request on one address and the second calls it multiple times (so, it sends request to many IPs). Both methods are async, so they don't block code execution while the requests are proccessing remotely. The problem is, due to my poor C# knowledge, I don't know how to send all the requests simultaneously, not one after another (which my code does). That's my code:
public static async Task<string> SendRequest(Uri uri)
{
using (var client = new HttpClient())
{
var resp = await client.GetStringAsync(uri).ConfigureAwait(false);
return resp;
}
}
public static async Task<string[]> SendToAllIps(string req)
{
string[] resp = new string[_allIps.Length];
for (int i = 0; i < _allIps.Length; i++)
{
resp[i] = await SendRequest(new Uri(_allIps[i] + req));
}
return resp;
}
How to make SendToAllIps send requests without awaiting for previous task result? Also SendToAllIps must return an array of responses when all the requests are finished. As far as I understand, this can be done with Task.WaitAll, but how to use it in this particular situation?
You can use Task.WhenAll to await a collection of tasks:
public static async Task<string[]> SendToAllIps(string req)
{
var tasks = _allIps.Select(ip => SendRequest(new Uri(ip + req)));
return await Task.WhenAll(tasks);
}
Answers provided above provide the correct way of doing it but doesn't provide rationale, let me explain what's wrong with your code:
Following line creates an issue:
resp[i] = await SendRequest(new Uri(_allIps[i] + req));
Why ?
As you are awaiting each individual request, it will stop the processing of the remaining requests, that's the behavior of the async-await and it will be almost the synchronous processing of each SendRequest when you wanted then to be concurrent.
Resolution:
SendRequest being an async method returns as Task<string>, you need add that to an IEnumerable<Task<string>> and then you have option:
Task.WaitAll or Task.WhenAll
Yours is Web API (Rest Application), which needs a Synchronization Context, so you need Task.WhenAll, which provides a Task as result to wait upon and integrate all the task results in an array, if you try using Task.WaitAll it will lead to deadlock, as it is not able to search the Synchronization Context, check the following:
WaitAll vs WhenAll
You can use the Task.WaitAll only for the Console application not the Web Application, Console App doesn't need any Synchronization Context or UI Thread
public static async Task<string[]> SendToAllIps(string req)
{
var tasks = new List<Task<string>>();
for (int i = 0; i < _allIps.Length; i++)
{
// Start task and assign the task itself to a collection.
var task = SendRequest(new Uri(_allIps[i] + req));
tasks.Add(task);
}
// await all the tasks.
string[] resp = await Task.WhenAll(tasks);
return resp;
}
The key here is to collect all the tasks in a collection and then await them all using await Task.WhenAll. Even though I think the solution from Lee is more elegant...
I have an enumeration of items (RunData.Demand), each representing some work involving calling an API over HTTP. It works great if I just foreach through it all and call the API during each iteration. However, each iteration takes a second or two so I'd like to run 2-3 threads and divide up the work between them. Here's what I'm doing:
ThreadPool.SetMaxThreads(2, 5); // Trying to limit the amount of threads
var tasks = RunData.Demand
.Select(service => Task.Run(async delegate
{
var availabilityResponse = await client.QueryAvailability(service);
// Do some other stuff, not really important
}));
await Task.WhenAll(tasks);
The client.QueryAvailability call basically calls an API using the HttpClient class:
public async Task<QueryAvailabilityResponse> QueryAvailability(QueryAvailabilityMultidayRequest request)
{
var response = await client.PostAsJsonAsync("api/queryavailabilitymultiday", request);
if (response.IsSuccessStatusCode)
{
return await response.Content.ReadAsAsync<QueryAvailabilityResponse>();
}
throw new HttpException((int) response.StatusCode, response.ReasonPhrase);
}
This works great for a while, but eventually things start timing out. If I set the HttpClient Timeout to an hour, then I start getting weird internal server errors.
What I started doing was setting a Stopwatch within the QueryAvailability method to see what was going on.
What's happening is all 1200 items in RunData.Demand are being created at once and all 1200 await client.PostAsJsonAsync methods are being called. It appears it then uses the 2 threads to slowly check back on the tasks, so towards the end I have tasks that have been waiting for 9 or 10 minutes.
Here's the behavior I would like:
I'd like to create the 1,200 tasks, then run them 3-4 at a time as threads become available. I do not want to queue up 1,200 HTTP calls immediately.
Is there a good way to go about doing this?
As I always recommend.. what you need is TPL Dataflow (to install: Install-Package System.Threading.Tasks.Dataflow).
You create an ActionBlock with an action to perform on each item. Set MaxDegreeOfParallelism for throttling. Start posting into it and await its completion:
var block = new ActionBlock<QueryAvailabilityMultidayRequest>(async service =>
{
var availabilityResponse = await client.QueryAvailability(service);
// ...
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });
foreach (var service in RunData.Demand)
{
block.Post(service);
}
block.Complete();
await block.Completion;
Old question, but I would like to propose an alternative lightweight solution using the SemaphoreSlim class. Just reference System.Threading.
SemaphoreSlim sem = new SemaphoreSlim(4,4);
foreach (var service in RunData.Demand)
{
await sem.WaitAsync();
Task t = Task.Run(async () =>
{
var availabilityResponse = await client.QueryAvailability(serviceCopy));
// do your other stuff here with the result of QueryAvailability
}
t.ContinueWith(sem.Release());
}
The semaphore acts as a locking mechanism. You can only enter the semaphore by calling Wait (WaitAsync) which subtracts one from the count. Calling release adds one to the count.
You're using async HTTP calls, so limiting the number of threads will not help (nor will ParallelOptions.MaxDegreeOfParallelism in Parallel.ForEach as one of the answers suggests). Even a single thread can initiate all requests and process the results as they arrive.
One way to solve it is to use TPL Dataflow.
Another nice solution is to divide the source IEnumerable into partitions and process items in each partition sequentially as described in this blog post:
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
While the Dataflow library is great, I think it's a bit heavy when not using block composition. I would tend to use something like the extension method below.
Also, unlike the Partitioner method, this runs the async methods on the calling context - the caveat being that if your code is not truly async, or takes a 'fast path', then it will effectively run synchronously since no threads are explicitly created.
public static async Task RunParallelAsync<T>(this IEnumerable<T> items, Func<T, Task> asyncAction, int maxParallel)
{
var tasks = new List<Task>();
foreach (var item in items)
{
tasks.Add(asyncAction(item));
if (tasks.Count < maxParallel)
continue;
var notCompleted = tasks.Where(t => !t.IsCompleted).ToList();
if (notCompleted.Count >= maxParallel)
await Task.WhenAny(notCompleted);
}
await Task.WhenAll(tasks);
}