Channel multiple producers and consumers - c#

I have the below code:
var channel = Channel.CreateUnbounded<string>();
var consumers = Enumerable
.Range(1, 5)
.Select(consumerNumber =>
Task.Run(async () =>
{
var rnd = new Random();
while (await channel.Reader.WaitToReadAsync())
{
if (channel.Reader.TryRead(out var item))
{
Console.WriteLine($"Consuming {item} on consumer {consumerNumber}");
}
}
}));
var producers = Enumerable
.Range(1, 5)
.Select(producerNumber =>
Task.Run(async () =>
{
var rnd = new Random();
for (var i = 0; i < 10; i++)
{
var t = $"Message {i}";
Console.WriteLine($"Producing {t} on producer {producerNumber}");
await channel.Writer.WriteAsync(t);
await Task.Delay(TimeSpan.FromSeconds(rnd.Next(3)));
}
}));
await Task.WhenAll(producers)
.ContinueWith(_ => channel.Writer.Complete());
await Task.WhenAll(consumers);
Which works as it should however im wanting it to consume at the same time as producing. However
await Task.WhenAll(producers)
.ContinueWith(_ => channel.Writer.Complete());
Blocks the consumer from running until its complete and I can't think of a way of getting them both to run?

There are a couple of issues with the code, including forgetting to enumate the producers and consumers enumerables. IEnumerable is evaluated lazily, so until you actually enumerate it with eg foreach or ToList, nothing is generated.
There's nothing wrong with ContinueWith when used properly either. It's definitely better and cheaper than using exceptions as control flow.
The code can be improved a lot by using some common Channel coding patterns.
The producer owns and encapsulates the channel
The producer exposes only Reader(s)
Plus, ContinueWith is an excellent choice to signal a ChannelWriter's completion, as we don't care at all which thread will do that. If anything, we'd prefer to use one of the "worker" threads to avoid a thread switch.
Let's say the producer function is:
async Task Produce(ChannelWriter<string> writer, int producerNumber)
{
return Task.Run(async () =>
{
var rnd = new Random();
for (var i = 0; i < 10; i++)
{
var t = $"Message {i}";
Console.WriteLine($"Producing {t} on producer {producerNumber}");
await channel.Writer.WriteAsync(t);
await Task.Delay(TimeSpan.FromSeconds(rnd.Next(3)));
}
}
}
Producer
The producer can be :
ChannelReader<string> ProduceData(int dop)
{
var channel=Channel.CreateUnbounded<string>();
var writer=channel.Writer;
var tasks=Enumerable.Range(0,dop)
.Select(producerNumber => Produce(producerNumber))
.ToList();
_ =Task.WhenAll(tasks).ContinueWith(t=>writer.TryComplete(t.Exception));
.
return channel.Reader;
}
Completion and error propagation
Notice the line :
_ =Task.WhenAll(tasks).ContinueWith(t=>writer.TryComplete(t.Exception));
This says that as soon as the producers complete, the writer itself should complete with any exception that may be raised. It doesn't really matter what thread the continuation runs on as it doesn't do anything other than call TryComplete.
More importantly, t=>writer.TryComplete(t.Exception) propagates the worker exception(s) to downstream consumers. Otherwise the consumers would never know something went wrong. If you had a database consumer you'd want it to avoid finalizing any changes if the source aborted.
Consumer
The consumer method can be:
async Task Consume(ChannelReader<string> reader,int dop,CancellationToken token=default)
{
var tasks= Enumerable
.Range(1, dop)
.Select(consumerNumber =>
Task.Run(async () =>
{
await foreach(var item in reader.ReadAllAsync(token))
{
Console.WriteLine($"Consuming {item} on consumer {consumerNumber}");
}
}));
await Task.WhenAll(tasks);
}
In this case await Task.WhenAll(tasks); enumerates the worker tasks thus starting them.
Nothing else is needed to produce all generated messages. When all producers finish, the Channel.Reader is completed. When that happens, ReadAllAsync will keep offering all remaining messages to the consumers and exit.
Composition
Combining both methods is as easy as:
var reader=Produce(10);
await Consume(reader);
General Pattern
This is a general pattern for pipeline stages using Channels - read the input from a ChannelReader, write it to an internal Channel and return only the owned channel's Reader. This way the stage owns the channel which makes completion and error handling a lot easier:
static ChannelReader<TOut> Crunch<Tin,TOut>(this ChannelReader<Tin>,int dop=1,CancellationToken token=default)
{
var channel=Channel.CreateUnbounded<TOut>();
var writer=channel.Writer;
var tasks=Enumerable.Range(0,dop)
.Select(async i=>Task.Run(async ()=>
{
await(var item in reader.ReadAllAsync(token))
{
try
{
...
await writer.WriteAsync(msg);
}
catch(Exception exc)
{
//Handle the exception and keep processing messages
}
}
},token));
_ =Task.WhenAll(tasks)
.ContinueWith(t=>writer.TryComplete(t.Exception));
return channel.Reader;
}
This allows chaining multiple "stages" together to form a pipeline:
var finalReader=Producer(...)
.Crunch1()
.Crunch2(10)
.Crunch3();
await foreach(var result in finalReader.ReadAllAsync())
{
...
}
Producer and consumer methods can be written in the same way, allowing, eg the creation of a data import pipeline:
var importTask = ReadFiles<string>(somePath)
.ParseCsv<string,Record[]>(10)
.ImportToDb<Record>(connectionString);
await importTask;
With ReadFiles
static ChannelReader<string> ReadFiles(string folder)
{
var channel=Channel.CreateUnbounded<string>();
var writer=channel.Writer;
var task=Task.Run(async ()=>{
foreach(var path in Directory.EnumerateFiles(folder,"*.csv"))
{
await writer.WriteAsync(path);
}
});
task.ContinueWith(t=>writer.TryComplete(t.Exception));
return channel.Reader;
}
Update for .NET 6 Parallel.ForEachAsync
Now that .NET 6 is supported in production, one could use Parallel.ForEachAsync to simplify a concurrent consumer to :
static ChannelReader<TOut> Crunch<Tin,TOut>(this ChannelReader<Tin>,
int dop=1,CancellationToken token=default)
{
var channel=Channel.CreateUnbounded<TOut>();
var writer=channel.Writer;
var dop=new ParallelOptions {
MaxDegreeOfParallelism = dop,
CancellationToken = token
};
var task=Parallel.ForEachAsync(
reader.ReadAllAsync(token),
dop,
async item =>{
try
{
...
await writer.WriteAsync(msg);
}
catch(Exception exc)
{
//Handle the exception and keep processing messages
}
});
task.ContinueWith(t=>writer.TryComplete(t.Exception));
return channel.Reader;
}

The consumers and producers variables are of type IEnumerable<Task>. This a deferred enumerable, that needs to be materialized in order for the tasks to be created. You can materialize the enumerable by chaining the ToArray operator on the LINQ queries. By doing so, the type of the two variables will become Task[], which means that your tasks are instantiated and up and running.
As a side note, the ContinueWith method requires passing explicitly the TaskScheduler.Default as an argument, otherwise you are at the mercy of whatever the TaskScheduler.Current may be (it might be the UI TaskScheduler for example). This is the correct usage of ContinueWith:
await Task.WhenAll(producers)
.ContinueWith(_ => channel.Writer.Complete(), TaskScheduler.Default);
Code analyzer CA2008: Do not create tasks without passing a TaskScheduler
"[...] This is why in production library code I write, I always explicitly specify the scheduler I want to use." (Stephen Toub)
Another problem is that any exceptions thrown by the producers will be swallowed, because the tasks are not awaited. Only the continuation is awaited, which is unlikely to fail. To solve this problem, you could just ditch the primitive ContinueWith, and instead use async-await composition (an async local function that awaits the producers and then completes the channel). In this case not even that is necessary. You could simply do this:
try { await Task.WhenAll(producers); }
finally { channel.Writer.Complete(); }
The channel will Complete after any outcome of the Task.WhenAll(producers) task, and so the consumers will not get stuck.
A third problem is that a failure of some of the producers will cause the immediate termination of the current method, before awaiting the consumers. These tasks will then become fire-and-forget tasks. I am leaving it to you to find how you can ensure that all tasks can be awaited, in all cases, before exiting the method either successfully or with an error.

Related

BlockCollection alternatives?

We are using BlockCollection to implement producer-consumer pattern in a real-time application, i.e.
BlockingCollection<T> collection = new BlockingCollection<T>();
CancellationTokenSource cancellationTokenSource = new CancellationTokenSource();
// Starting up consumer
Task.Run(() => consumer(this.cancellationTokenSource.Token));
…
void Producer(T item)
{
collection.Add(item);
}
…
void consumer()
{
while (true)
{
var item = this.blockingCollection.Take(token);
process (item);
}
}
To be sure, this is a very simplified version of the actual production code.
Sometimes when the application is under heavy load, we observe that the consuming part is lagging behind the producing part. Since the application logic is very complex, it involves interaction with other applications over network, as well as with SQL databases. Delays could be occurring in many places; they could occur in the calls to process(), which might in principle explain why the consuming part can be slow.
All the above considerations aside, is there something inherent in using BlockingCollection, which could explain this phenomenon? Are there more efficient options in .Net to realise producer-consumer pattern?
First of all, BlockingCollection isn't the best choice for producer/consumer scenarios. There are at least two better options (Dataflow, Channels) and the choice depends on the actual application scenario - which is missing from the question.
It's also possible to create a producer/consumer pipeline without a buffer, by using async streams and IAsyncEnmerable.
Async Streams
In this case, the producer can be an async iterator. The consumer will receive the IAsyncEnumerable and iterate over it until it completes. It could also produce its own IAsyncEnumerable output, which can be passed to the next method in the pipeline:
The producer can be :
public static async IAsyncEnumerable<Message> ProducerAsync(CancellationToken token)
{
while(!token.IsCancellationRequested)
{
var msg=await Task.Run(()=>SomeHeavyWork());
yield return msg;
}
}
And the consumer :
async Task ConsumeAsync(IAsyncEnumerable<Message> source)
{
await foreach(var msg in source)
{
await consumeMessage(msg);
}
}
There's no buffering in this case, and the producer can't emit a new message until the consumer consumes the current one. The consumer can be parallelized with Parallel.ForEachAsync. Finally, the System.Linq.Async provides LINQ operations to async streams, allowing us to write eg :
List<OtherMsg> results=await ProducerAsync(cts.Token)
.Select(msg=>consumeAndReturn(msg))
.ToListAsync();
Dataflow - ActionBlock
Dataflow blocks can be used to construct entire processing pipelines, with each block receiving a message (data) from the previous one, processing it and passing it to the next block. Most blocks have input and where appropriate output buffers. Each block uses a single worker task but can be configured to use more. The application code doesn't have to handle the tasks though.
In the simplest case, a single ActionBlock can process messages posted to it by one or more producers, acting as a consumer:
async Task ConsumeAsync<Message>(Message message)
{
//Do something with the message
}
...
ExecutionDataflowBlockOptions _options= new () {
MaxDegreeOfParallelism=4,
BoundedCapacity=5
};
ActionBlock<Message> _block=new ActionBlock(ConsumeAsync,_options);
async Task ProduceAsync(CancellationToken token)
{
while(!token.IsCancellationRequested)
{
var msg=await produceNewMessageAsync();
await _block.SendAsync(msg);
}
_block.Complete();
await _block.Completion;
}
In this example the block uses 4 worker tasks and will block if more than 5 items are waiting in its input buffer, beyond those currently being processed.
BufferBlock as a producer/consumer queue
A BufferBlock is an inactive block that's used as a buffer by other blocks. It can be used as an asynchronous producer/consumer collection as shown in How to: Implement a producer-consumer dataflow pattern. In this case, the code needs to receive messages explicitly. Threading is up to the developer. :
static void Produce(ITargetBlock<byte[]> target)
{
var rand = new Random();
for (int i = 0; i < 100; ++ i)
{
var buffer = new byte[1024];
rand.NextBytes(buffer);
target.Post(buffer);
}
target.Complete();
}
static async Task<int> ConsumeAsync(ISourceBlock<byte[]> source)
{
int bytesProcessed = 0;
while (await source.OutputAvailableAsync())
{
byte[] data = await source.ReceiveAsync();
bytesProcessed += data.Length;
}
return bytesProcessed;
}
static async Task Main()
{
var buffer = new BufferBlock<byte[]>();
var consumerTask = ConsumeAsync(buffer);
Produce(buffer);
var bytesProcessed = await consumerTask;
Console.WriteLine($"Processed {bytesProcessed:#,#} bytes.");
}
Parallelized consumer
In .NET 6 the consumer can be simplified by using await foreach and ReceiveAllAsync :
static async Task<int> ConsumeAsync(IReceivableSourceBlock<byte[]> source)
{
int bytesProcessed = 0;
await foreach(var data in source.ReceiveAllAsync())
{
bytesProcessed += data.Length;
}
return bytesProcessed;
}
And processed concurrently using Parallel.ForEachAsync :
static async Task ConsumeAsync(IReceivableSourceBlock<byte[]> source)
{
var msgs=source.ReceiveAllAsync();
await Parallel.ForEachAsync(msgs,
new ParallelOptions { MaxDegreeOfParallelism = 4},
msg=>ConsumeMsgAsync(msg));
}
By default Parallel.ForeachAsync will use as many worker tasks as there are cores
Channels
Channels are similar to Go's channels. They are built specifically for producer/consumer scenarios and allow creating pipelines at a lower level than the Dataflow library. If the Dataflow library was built today, it would be built on top of Channels.
A channel can't be accessed directly, only through its Reader or Writer interfaces. This is intentional, and allows easy pipelining of methods. A very common pattern is for a producer method to create an channel it owns and return only a ChannelReader. Consuming methods accept that reader as input. This way, the producer can control the channel's lifetime without worrying whether other producers are writing to it.
With channels, a producer would look like this :
ChannelReader<Message> Producer(CancellationToken token)
{
var channel=Channel.CreateBounded(5);
var writer=channel.Writer;
_ = Task.Run(()=>{
while(!token.IsCancellationRequested)
{
...
await writer.SendAsync(msg);
}
},token)
.ContinueWith(t=>writer.TryComplete(t.Exception));
return channel.Reader;
}
The unusual .ContinueWith(t=>writer.TryComplete(t.Exception)); is used to signal completion to the writer. This will signal readers to complete as well. This way completion propagates from one method to the next. Any exceptions are propagated as well
writer.TryComplete(t.Exception)) doesn't block or perform any significant work so it doesn't matter what thread it executes on. This means there's no need to use await on the worker task, which would complicate the code by rethrowing any exceptions.
A consuming method only needs the ChannelReader as source.
async Task ConsumerAsync(ChannelReader<Message> source)
{
await Parallel.ForEachAsync(source.ReadAllAsync(),
new ParallelOptions { MaxDegreeOfParallelism = 4},
msg=>consumeMessageAsync(msg)
);
}
A method may read from one channel and publish new data to another using the producer pattern :
ChannelReader<OtherMessage> ConsumerAsync(ChannelReader<Message> source)
{
var channel=Channel.CreateBounded<OtherMessage>();
var writer=channel.Writer;
await Parallel.ForEachAsync(source.ReadAllAsync(),
new ParallelOptions { MaxDegreeOfParallelism = 4},
async msg=>{
var newMsg=await consumeMessageAsync(msg);
await writer.SendAsync(newMsg);
})
.ContinueWith(t=>writer.TryComplete(t.Exception));
}
You could look at using the Dataflow library. I'm not sure if it is more performant than a BlockingCollection. As others have said, there is no guarantee that you can consume faster than produce, so it is always possible to fall behind.

Correct pattern to check if an async Task completed synchronously before awaiting it

I have a bunch of requests to process, some of which may complete synchronously.
I'd like to gather all results that are immediately available and return them early, while waiting for the rest.
Roughly like this:
List<Task<Result>> tasks = new ();
List<Result> results = new ();
foreach (var request in myRequests) {
var task = request.ProcessAsync();
if (task.IsCompleted)
results.Add(task.Result); // or Add(await task) ?
else
tasks.Add(task);
}
// send results that are available "immediately" while waiting for the rest
if (results.Count > 0) SendResults(results);
results = await Task.WhenAll(tasks);
SendResults(results);
I'm not sure whether relying on IsCompleted might be a bad idea; could there be situations where its result cannot be trusted, or where it may change back to false again, etc.?
Similarly, could it be dangerous to use task.Result even after checking IsCompleted, should one always prefer await task? What if were using ValueTask instead of Task?
I'm not sure whether relying on IsCompleted might be a bad idea; could there be situations where its result cannot be trusted...
If you're in a multithreaded context, it's possible that IsCompleted could return false at the moment when you check on it, but it completes immediately thereafter. In cases like the code you're using, the cost of this happening would be very low, so I wouldn't worry about it.
or where it may change back to false again, etc.?
No, once a Task completes, it cannot uncomplete.
could it be dangerous to use task.Result even after checking IsCompleted.
Nope, that should always be safe.
should one always prefer await task?
await is a great default when you don't have a specific reason to do something else, but there are a variety of use cases where other patterns might be useful. The use case you've highlighted is a good example, where you want to return the results of finished tasks without awaiting all of them.
As Stephen Cleary mentioned in a comment below, it may still be worthwhile to use await to maintain expected exception behavior. You might consider doing something more like this:
var requestsByIsCompleted = myRequests.ToLookup(r => r.IsCompleted);
// send results that are available "immediately" while waiting for the rest
SendResults(await Task.WhenAll(requestsByIsCompleted[true]));
SendResults(await Task.WhenAll(requestsByIsCompleted[false]));
What if were using ValueTask instead of Task?
The answers above apply equally to both types.
You could use code like this to continually send the results of completed tasks while waiting on others to complete.
foreach (var request in myRequests)
{
tasks.Add(request.ProcessAsync());
}
// wait for at least one task to be complete, then send all available results
while (tasks.Count > 0)
{
// wait for at least one task to complete
Task.WaitAny(tasks.ToArray());
// send results for each completed task
var completedTasks = tasks.Where(t => t.IsCompleted);
var results = completedTasks.Where(t => t.IsCompletedSuccessfully).Select(t => t.Result).ToList();
SendResults(results);
// TODO: handle completed but failed tasks here
// remove completed tasks from the tasks list and keep waiting
tasks.RemoveAll(t => completedTasks.Contains(t));
}
Using only await you can achieve the desired behavior:
async Task ProcessAsync(MyRequest request, Sender sender)
{
var result = await request.ProcessAsync();
await sender.SendAsync(result);
}
...
async Task ProcessAll()
{
var tasks = new List<Task>();
foreach(var request in requests)
{
var task = ProcessAsync(request, sender);
// Dont await until all requests are queued up
tasks.Add(task);
}
// Await on all outstanding requests
await Task.WhenAll(tasks);
}
There are already good answers, but in addition of them here is my suggestion too, on how to handle multiple tasks and process each task differently, maybe it will suit your needs. My example is with events, but you can replace them with some kind of state management that fits your needs.
public interface IRequestHandler
{
event Func<object, Task> Ready;
Task ProcessAsync();
}
public class RequestHandler : IRequestHandler
{
// Hier where you wraps your request:
// private object request;
private readonly int value;
public RequestHandler(int value)
=> this.value = value;
public event Func<object, Task> Ready;
public async Task ProcessAsync()
{
await Task.Delay(1000 * this.value);
// Hier where you calls:
// var result = await request.ProcessAsync();
//... then do something over the result or wrap the call in try catch for example
var result = $"RequestHandler {this.value} - [{DateTime.Now.ToLongTimeString()}]";
if (this.Ready is not null)
{
// If result passes send the result to all subscribers
await this.Ready.Invoke($"RequestHandler {this.value} - [{DateTime.Now.ToLongTimeString()}]");
}
}
}
static void Main()
{
var a = new RequestHandler(1);
a.Ready += PrintAsync;
var b = new RequestHandler(2);
b.Ready += PrintAsync;
var c = new RequestHandler(3);
c.Ready += PrintAsync;
var d= new RequestHandler(4);
d.Ready += PrintAsync;
var e = new RequestHandler(5);
e.Ready += PrintAsync;
var f = new RequestHandler(6);
f.Ready += PrintAsync;
var requests = new List<IRequestHandler>()
{
a, b, c, d, e, f
};
var tasks = requests
.Select(x => Task.Run(x.ProcessAsync));
// Hier you must await all of the tasks
Task
.Run(async () => await Task.WhenAll(tasks))
.Wait();
}
static Task PrintAsync(object output)
{
Console.WriteLine(output);
return Task.CompletedTask;
}

Retry Failed Tasks in C#.Net -WhenAll Async [duplicate]

I have a solution that creates multiple I/O based tasks and I'm using Task.WhenAny() to manage these tasks. But often many of the tasks will fail due to network issue or request throttling etc.
I can't seem to find a solution that enables me to successfully retry failed tasks when using a Task.WhenAny() approach.
Here is what I'm doing:
var tasks = new List<Task<MyType>>();
foreach(var item in someCollection)
{
task.Add(GetSomethingAsync());
}
while (tasks.Count > 0)
{
var child = await Task.WhenAny(tasks);
tasks.Remove(child);
???
}
So the above structure works for completing tasks, but I haven't found a way to handle and retry failing tasks. The await Task.WhenAny throws an AggregateException rather than allowing me to inspect a task status. When In the exception handler I no longer have any way to retry the failed task.
I believe it would be easier to retry within the tasks, and then replace the Task.WhenAny-in-a-loop antipattern with Task.WhenAll
E.g., using Polly:
var tasks = new List<Task<MyType>>();
var policy = ...; // See Polly documentation
foreach(var item in someCollection)
tasks.Add(policy.ExecuteAsync(() => GetSomethingAsync()));
await Task.WhenAll(tasks);
or, more succinctly:
var policy = ...; // See Polly documentation
var tasks = someCollection.Select(item => policy.ExecuteAsync(() => GetSomethingAsync()));
await Task.WhenAll(tasks);
If you don't want to use the Polly library for some reason, you could use the Retry method bellow. It accepts a task factory, and keeps creating and then awaiting a task until it completes successfully, or the maxAttempts have been reached:
public static async Task<TResult> Retry<TResult>(Func<Task<TResult>> taskFactory,
int maxAttempts)
{
int failedAttempts = 0;
while (true)
{
try
{
var task = taskFactory();
return await task.ConfigureAwait(false);
}
catch
{
failedAttempts++;
if (failedAttempts >= maxAttempts) throw;
}
}
}
You could then use this method to download (for example) some web pages.
string[] urls =
{
"https://stackoverflow.com",
"https://superuser.com",
//"https://no-such.url",
};
var httpClient = new HttpClient();
var tasks = urls.Select(url => Retry(async () =>
{
return (Url: url, Html: await httpClient.GetStringAsync(url));
}, maxAttempts: 5));
var results = await Task.WhenAll(tasks);
foreach (var result in results)
{
Console.WriteLine($"Url: {result.Url}, {result.Html.Length:#,0} chars");
}
Output:
Url: https://stackoverflow.com, 112,276 chars
Url: https://superuser.com, 122,784 chars
If you uncomment the third url then instead of these results an HttpRequestException will be thrown, after five failed attempts.
The Task.WhenAll method will wait for the completion of all tasks before propagating the error. In case it is preferable to report the error as soon as possible, you can find solutions in this question.

How to correctly queue up tasks to run in C#

I have an enumeration of items (RunData.Demand), each representing some work involving calling an API over HTTP. It works great if I just foreach through it all and call the API during each iteration. However, each iteration takes a second or two so I'd like to run 2-3 threads and divide up the work between them. Here's what I'm doing:
ThreadPool.SetMaxThreads(2, 5); // Trying to limit the amount of threads
var tasks = RunData.Demand
.Select(service => Task.Run(async delegate
{
var availabilityResponse = await client.QueryAvailability(service);
// Do some other stuff, not really important
}));
await Task.WhenAll(tasks);
The client.QueryAvailability call basically calls an API using the HttpClient class:
public async Task<QueryAvailabilityResponse> QueryAvailability(QueryAvailabilityMultidayRequest request)
{
var response = await client.PostAsJsonAsync("api/queryavailabilitymultiday", request);
if (response.IsSuccessStatusCode)
{
return await response.Content.ReadAsAsync<QueryAvailabilityResponse>();
}
throw new HttpException((int) response.StatusCode, response.ReasonPhrase);
}
This works great for a while, but eventually things start timing out. If I set the HttpClient Timeout to an hour, then I start getting weird internal server errors.
What I started doing was setting a Stopwatch within the QueryAvailability method to see what was going on.
What's happening is all 1200 items in RunData.Demand are being created at once and all 1200 await client.PostAsJsonAsync methods are being called. It appears it then uses the 2 threads to slowly check back on the tasks, so towards the end I have tasks that have been waiting for 9 or 10 minutes.
Here's the behavior I would like:
I'd like to create the 1,200 tasks, then run them 3-4 at a time as threads become available. I do not want to queue up 1,200 HTTP calls immediately.
Is there a good way to go about doing this?
As I always recommend.. what you need is TPL Dataflow (to install: Install-Package System.Threading.Tasks.Dataflow).
You create an ActionBlock with an action to perform on each item. Set MaxDegreeOfParallelism for throttling. Start posting into it and await its completion:
var block = new ActionBlock<QueryAvailabilityMultidayRequest>(async service =>
{
var availabilityResponse = await client.QueryAvailability(service);
// ...
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });
foreach (var service in RunData.Demand)
{
block.Post(service);
}
block.Complete();
await block.Completion;
Old question, but I would like to propose an alternative lightweight solution using the SemaphoreSlim class. Just reference System.Threading.
SemaphoreSlim sem = new SemaphoreSlim(4,4);
foreach (var service in RunData.Demand)
{
await sem.WaitAsync();
Task t = Task.Run(async () =>
{
var availabilityResponse = await client.QueryAvailability(serviceCopy));
// do your other stuff here with the result of QueryAvailability
}
t.ContinueWith(sem.Release());
}
The semaphore acts as a locking mechanism. You can only enter the semaphore by calling Wait (WaitAsync) which subtracts one from the count. Calling release adds one to the count.
You're using async HTTP calls, so limiting the number of threads will not help (nor will ParallelOptions.MaxDegreeOfParallelism in Parallel.ForEach as one of the answers suggests). Even a single thread can initiate all requests and process the results as they arrive.
One way to solve it is to use TPL Dataflow.
Another nice solution is to divide the source IEnumerable into partitions and process items in each partition sequentially as described in this blog post:
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
While the Dataflow library is great, I think it's a bit heavy when not using block composition. I would tend to use something like the extension method below.
Also, unlike the Partitioner method, this runs the async methods on the calling context - the caveat being that if your code is not truly async, or takes a 'fast path', then it will effectively run synchronously since no threads are explicitly created.
public static async Task RunParallelAsync<T>(this IEnumerable<T> items, Func<T, Task> asyncAction, int maxParallel)
{
var tasks = new List<Task>();
foreach (var item in items)
{
tasks.Add(asyncAction(item));
if (tasks.Count < maxParallel)
continue;
var notCompleted = tasks.Where(t => !t.IsCompleted).ToList();
if (notCompleted.Count >= maxParallel)
await Task.WhenAny(notCompleted);
}
await Task.WhenAll(tasks);
}

Parallel.ForEach using Thread.Sleep equivalent

So here's the situation: I need to make a call to a web site that starts a search. This search continues for an unknown amount of time, and the only way I know if the search has finished is by periodically querying the website to see if there's a "Download Data" link somewhere on it (it uses some strange ajax call on a javascript timer to check the backend and update the page, I think).
So here's the trick: I have hundreds of items I need to search for, one at a time. So I have some code that looks a little bit like this:
var items = getItems();
Parallel.ForEach(items, item =>
{
startSearch(item);
var finished = isSearchFinished(item);
while(finished == false)
{
finished = isSearchFinished(item); //<--- How do I delay this action 30 Secs?
}
downloadData(item);
}
Now, obviously this isn't the real code, because there could be things that cause isSearchFinished to always be false.
Obvious infinite loop danger aside, how would I correctly keep isSearchFinished() from calling over and over and over, but instead call every, say, 30 seconds or 1 minute?
I know Thread.Sleep() isn't the right solution, and I think the solution might be accomplished by using Threading.Timer() but I'm not very familiar with it, and there are so many threading options that I'm just not sure which to use.
It's quite easy to implement with tasks and async/await, as noted by #KevinS in the comments:
async Task<ItemData> ProcessItemAsync(Item item)
{
while (true)
{
if (await isSearchFinishedAsync(item))
break;
await Task.Delay(30 * 1000);
}
return await downloadDataAsync(item);
}
// ...
var items = getItems();
var tasks = items.Select(i => ProcessItemAsync(i)).ToArray();
await Task.WhenAll(tasks);
var data = tasks.Select(t = > t.Result);
This way, you don't block ThreadPool threads in vain for what is mostly a bunch of I/O-bound network operations. If you're not familiar with async/await, the async-await tag wiki might be a good place to start.
I assume you can convert your synchronous methods isSearchFinished and downloadData to asynchronous versions using something like HttpClient for non-blocking HTTP request and returning a Task<>. If you are unable to do so, you still can simply wrap them with Task.Run, as await Task.Run(() => isSearchFinished(item)) and await Task.Run(() => downloadData(item)). Normally this is not recommended, but as you have hundreds of items, it sill would give you a much better level of concurrency than with Parallel.ForEach in this case, because you won't be blocking pool threads for 30s, thanks to asynchronous Task.Delay.
You can also write a generic function using TaskCompletionSource and Threading.Timer to return a Task that becomes complete once a specified retry function succeeds.
public static Task RetryAsync(Func<bool> retryFunc, TimeSpan retryInterval)
{
return RetryAsync(retryFunc, retryInterval, CancellationToken.None);
}
public static Task RetryAsync(Func<bool> retryFunc, TimeSpan retryInterval, CancellationToken cancellationToken)
{
var tcs = new TaskCompletionSource<object>();
cancellationToken.Register(() => tcs.TrySetCanceled());
var timer = new Timer((state) =>
{
var taskCompletionSource = (TaskCompletionSource<object>) state;
try
{
if (retryFunc())
{
taskCompletionSource.TrySetResult(null);
}
}
catch (Exception ex)
{
taskCompletionSource.TrySetException(ex);
}
}, tcs, TimeSpan.FromMilliseconds(0), retryInterval);
// Once the task is complete, dispose of the timer so it doesn't keep firing. Also captures the timer
// in a closure so it does not get disposed.
tcs.Task.ContinueWith(t => timer.Dispose(),
CancellationToken.None,
TaskContinuationOptions.ExecuteSynchronously,
TaskScheduler.Default);
return tcs.Task;
}
You can then use RetryAsync like this:
var searchTasks = new List<Task>();
searchTasks.AddRange(items.Select(
downloadItem => RetryAsync( () => isSearchFinished(downloadItem), TimeSpan.FromSeconds(2)) // retry timout
.ContinueWith(t => downloadData(downloadItem),
CancellationToken.None,
TaskContinuationOptions.OnlyOnRanToCompletion,
TaskScheduler.Default)));
await Task.WhenAll(searchTasks.ToArray());
The ContinueWith part specifies what you do once the task has completed successfully. In this case it will run your downloadData method on a thread pool thread because we specified TaskScheduler.Default and the continuation will only execute if the task ran to completion, i.e. it was not canceled and no exception was thrown.

Categories

Resources