The following code does not return the entire collection it is iterating. The returned array has an arbitrary length on every run. What's wrong?
public async Task<IHttpActionResult> GetClients()
{
var clientInfoCollection = new ConcurrentBag<ClientInfoModel>();
await _client.Iterate(async (client) =>
{
clientInfoCollection.Add(new ClientInfoModel
{
name = client.name,
userCount = await _user.Count(clientId)
});
});
return Ok(clientInfoCollection.ToArray());
}
The following code uses the new async MongoDB C# driver
public async Task Iterate(Action<TDocument> processor)
{
await _collection.Find<TDocument>(_ => true).ForEachAsync(processor);
}
The reason you're seeing arbitrary number of values is in the fact the Iterate receives a delegate of type Action<T>, which is equivalent to async void, effectively making this a "fire-and-forget" style of execution.
The inner method isn't actually aware that an async delegate has been passed to it, hence it iterates the collection without actually asynchronously waiting for each item to complete.
What you need to do instead is make the method parameter a delegate of type Func<TDocument, Task> and use the proper overload of ForEachAsync:
public Task Iterate(Func<TDocument, Task> processor)
{
return _collection.Find<TDocument>(_ => true).ForEachAsync(processor);
}
You can see the source here:
public static async Task ForEachAsync<TDocument>(
this IAsyncCursor<TDocument> source,
Func<TDocument, int, Task> processor,
CancellationToken cancellationToken = default(CancellationToken))
{
Ensure.IsNotNull(source, "source");
Ensure.IsNotNull(processor, "processor");
// yes, we are taking ownership... assumption being that they've
// exhausted the thing and don't need it anymore.
using (source)
{
var index = 0;
while (await source.MoveNextAsync(cancellationToken).ConfigureAwait(false))
{
foreach (var document in source.Current)
{
await processor(document, index++).ConfigureAwait(false);
cancellationToken.ThrowIfCancellationRequested();
}
}
}
}
You create the threads, and set them off. From there you can't know what happens. But your codes next step is to return, so you are gambling that the threads will execute faster, than your main thread.
In normal threading scenarios, you will join the threads, who are adding items to the bag. Where a join, is the threads, waiting for the other threads to execute and thereby still being async, but waiting to return before everything is completed.
Which is perfectly explained here: http://www.dotnetperls.com/thread-join
Related
I'm trying to learn to write my own asynchronous methods, but I'm having difficulty, because ALL of the millions of examples that I have seen online ALL use await Task.Delay inside the custom async method and I neither want to add a delay into my code, nor have any other async method to call in its place.
Let's use a simple example, where I want to create a new collection of objects, with only two properties, from a huge existing collection of objects, that each have a great many properties. Let's say this is my synchronous code:
public List<SomeLightType> ToLightCollection(List<SomeType> collection)
{
List<SomeLightType> lightCollection = new()
foreach (SomeType item in collection)
{
lightCollection.Add(new SomeLightType(item.Id, item.Name));
}
return lightCollection;
}
To make this method asynchronous, do I just need to wrap it in a Task.Run, add the async keyword and suffix on the method name, and change the return type, as follows?:
public Task<List<SomeLightType>> ToLightCollectionAsync(List<SomeType> collection)
{
List<SomeLightType> lightCollection = new()
Task.Run(() =>
{
foreach (SomeType item in collection)
{
lightCollection.Add(new SomeLightType(item.Id, item.Name));
}
});
return lightCollection;
}
Or do I also need to await the return of the Task inside the method? (The compiler gave me a warning until I added await.):
public async Task<List<SomeLightType>> ToLightCollectionAsync(List<SomeType> collection)
{
List<SomeLightType> lightCollection = new()
await Task.Run(() =>
{
foreach (SomeType item in collection)
{
lightCollection.Add(new SomeLightType(item.Id, item.Name));
}
});
return lightCollection;
}
EDIT:
Oh yes, I have just realised that I need to await this operation, otherwise the empty collection will be returned before it is populated. But still, is this the correct way to make this code run asynchronously?
ALL of the millions of examples that I have seen online ALL use await Task.Delay inside the custom async method and I neither want to add a delay into my code, nor have any other async method to call in its place.
Task.Delay is commonly used as a "placeholder" meaning "replace this with your actual asynchronous work".
I'm trying to learn to write my own asynchronous methods
Asynchronous code begins at the "other end". The most common example is with an I/O operation: you can make this asynchronous instead of blocking the calling thread. At the lowest level, this is commonly done using a TaskCompletionSource<T>, which creates a Task<T> you can return immediately, and then later when the operation completes, you can use the TaskCompletionSource<T> to complete the Task<T>.
However, as you state in the comments:
I definitely want that method to run asynchronously, as it currently takes several minutes... this is a WPF application
What you really want is not asynchronous code; you want to run some code on a background thread so it doesn't block the UI thread. The code being run is CPU-bound and has no I/O to do, so it's just going to run on a thread pool thread instead of actually being asynchronous.
Let's use a simple example... To make this method asynchronous...
To run this code on a background thread, you would use Task.Run. However, I recommend that you do not implement this method using Task.Run. If you do, then you have a method that looks asynchronous but is not actually asynchronous; it's just running synchronously on a thread pool thread - what I call "fake asynchronous" (it has an asynchronous signature but is not actually asynchronous).
IMO, it's cleaner to keep your business logic synchronous, and in this case since you want to free up the UI thread, have the UI code call it using Task.Run:
// no change
public List<SomeLightType> ToLightCollection(List<SomeType> collection)
{
List<SomeLightType> lightCollection = new()
foreach (SomeType item in collection)
{
lightCollection.Add(new SomeLightType(item.Id, item.Name));
}
return lightCollection;
}
async void Button_Click(...)
{
var source = ...
var lights = await Task.Run(() => ToLightCollection(source));
... // Do something with lights
}
Task.Run is for CPU-Bound work (see learn.microsoft.com - Async in depth.
You can avoid await Task.Run() scenarios if you return the created task directly:
public Task<List<SomeLightType>> ToLightCollectionAsync(List<SomeType> collection) => Task.Run(() =>
{
List<SomeLightType> lightCollection = new();
// Do CPU bound work
foreach (SomeType item in collection)
{
lightCollection.Add(new SomeLightType(item.Id, item.Name));
}
return lightCollection;
});
Now the caller can await for the result in an async Method to keep your UI responsive:
public async Task CallerMethod()
{
// ...
var result = await ToLightCollectionAsync(collection);
}
You also have the opportunity to perform some work during this computation.
public async Task CallerMethod()
{
var task = ToLightCollectionAsync(collection);
// Do some other work
var result = await task;
}
I have an async method like this:
private async Task SendAsync(string text) {
...
}
I also have to use this method one time for each item in a List:
List<string> textsToSend = new Service().GetMessages();
Currently my implementation is this:
List<string> textsToSend = new Service().GetMessages();
List<Task> tasks = new List<Task>(textsToSend.Count);
textsToSend.ForEach(t => tasks.Add(SendAsync(t)));
await Task.WhenAll(tasks);
With this code, I get a Task for each message that runs async the sending method.
However, I don't know if is there any different between my implementation and this one:
List<string> textsToSend = new Service().GetMessages();
textsToSend.ForEach(async t => await SendAsync(t));
In the second one, I don't have the List<Task> allocation, but I think that the first one launch all Task in parallel and the second sample, one by one.
Could you help me to clarify if is there any different between the first and second samples?
PD: I also know that C#8 supports foreach async, however I'm using C# 7
You don't even need a list, much less ForEach to execute multiple tasks and await all of them. In any case, ForEach is just a convenience function that uses `foreach.
To execute some async calls concurrently bases on a list of inputs all you need is Enumerable.Select. To await all of them to complete you only need Task.WhenAll :
var tasks=textsToSend.Select(text=>SendAsync(text));
await Task.WhenAll(tasks);
LINQ and IEnumerable in general use lazy evaluation which means Select's code won't be executed until the returned IEnumerable is iterated. In this case it doesn't matter because it's iterated in the very next line. If one wanted to force all tasks to start a call to ToArray() would be enough, eg :
var tasks=textsToSend.Select(SendAsync).ToArray();
If you wanted to execute those async calls sequentially, ie one after the other, you could use a simple foreach. There's no need for C# 8's await foreach :
foreach(var text in textsToSend)
{
await SendAsync(text);
}
The Bug
This line is simply a bug :
textsToSend.ForEach(async t => await SendAsync(t));
ForEach doesn't know anything about tasks so it never awaits for the generated tasks to complete. In fact, the tasks can't be awaited at all. The async t syntax creates an async void delegate. It's equivalent to :
async void MyMethod(string t)
{
await SendAsync(t);
}
textToSend.ForEach(t=>MyMethod(t));
This brings all the problems of async void methods. Since the application knows nothing about those async void calls, it could easily terminate before those methods complete, resulting in NREs, ObjectDisposedExceptions and other weird problems.
For reference check David Fowler's Implicit async void delegates
C# 8 and await foreach
C# 8's IAsyncEnumerable would be useful in the sequential case, if we wanted to return the results of each async operation in an iterator, as soon as we got them.
Before C# 8 there would be no way to avoid awaiting for all results, even with sequential execution. We'd have to collect all of them in a list. Assuming each operation returned a string, we'd have to write :
async Task<List<string> SendTexts(IEnumerable<string> textsToSend)
{
var results=new List<string>();
foreach(var text in textsToSend)
{
var result=await SendAsync(text);
results.Add(result);
}
}
And use it with :
var results=await SendTexts(texts);
In C# 8 we can return individual results and use them asynchronously. We don't need to cache the results before returning them either :
async IAsyncEmumerable<string> SendTexts(IEnumerable<string> textsToSend)
{
foreach(var text in textsToSend)
{
var result=await SendAsync(text);
yield return;
}
}
await foreach(var result in SendTexts(texts))
{
...
}
await foreach is only needed to consume the IAsyncEnumerable result, not produce it
that the first one launch all Task in parallel
Correct. And await Task.WhenAll(tasks); waits for all messages are sent.
The second one also sends messages in parallel but doesn't wait for all messages are sent since you don't await any task.
In your case:
textsToSend.ForEach(async t => await SendAsync(t));
is equivalent to
textsToSend.ForEach(t => SendAsync(t));
the async t => await SendAsync(t) delegate may return the task (it depends on assignable type) as SendAsync(t). In case of passing it to ForEach both async t => await SendAsync(t) and SendAsync(t) will be translated to Action<string>.
Also the first code will throw an exception if any SendAsync throws an excepion. In the second code any exception will be ignored.
I have an enumeration of items (RunData.Demand), each representing some work involving calling an API over HTTP. It works great if I just foreach through it all and call the API during each iteration. However, each iteration takes a second or two so I'd like to run 2-3 threads and divide up the work between them. Here's what I'm doing:
ThreadPool.SetMaxThreads(2, 5); // Trying to limit the amount of threads
var tasks = RunData.Demand
.Select(service => Task.Run(async delegate
{
var availabilityResponse = await client.QueryAvailability(service);
// Do some other stuff, not really important
}));
await Task.WhenAll(tasks);
The client.QueryAvailability call basically calls an API using the HttpClient class:
public async Task<QueryAvailabilityResponse> QueryAvailability(QueryAvailabilityMultidayRequest request)
{
var response = await client.PostAsJsonAsync("api/queryavailabilitymultiday", request);
if (response.IsSuccessStatusCode)
{
return await response.Content.ReadAsAsync<QueryAvailabilityResponse>();
}
throw new HttpException((int) response.StatusCode, response.ReasonPhrase);
}
This works great for a while, but eventually things start timing out. If I set the HttpClient Timeout to an hour, then I start getting weird internal server errors.
What I started doing was setting a Stopwatch within the QueryAvailability method to see what was going on.
What's happening is all 1200 items in RunData.Demand are being created at once and all 1200 await client.PostAsJsonAsync methods are being called. It appears it then uses the 2 threads to slowly check back on the tasks, so towards the end I have tasks that have been waiting for 9 or 10 minutes.
Here's the behavior I would like:
I'd like to create the 1,200 tasks, then run them 3-4 at a time as threads become available. I do not want to queue up 1,200 HTTP calls immediately.
Is there a good way to go about doing this?
As I always recommend.. what you need is TPL Dataflow (to install: Install-Package System.Threading.Tasks.Dataflow).
You create an ActionBlock with an action to perform on each item. Set MaxDegreeOfParallelism for throttling. Start posting into it and await its completion:
var block = new ActionBlock<QueryAvailabilityMultidayRequest>(async service =>
{
var availabilityResponse = await client.QueryAvailability(service);
// ...
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });
foreach (var service in RunData.Demand)
{
block.Post(service);
}
block.Complete();
await block.Completion;
Old question, but I would like to propose an alternative lightweight solution using the SemaphoreSlim class. Just reference System.Threading.
SemaphoreSlim sem = new SemaphoreSlim(4,4);
foreach (var service in RunData.Demand)
{
await sem.WaitAsync();
Task t = Task.Run(async () =>
{
var availabilityResponse = await client.QueryAvailability(serviceCopy));
// do your other stuff here with the result of QueryAvailability
}
t.ContinueWith(sem.Release());
}
The semaphore acts as a locking mechanism. You can only enter the semaphore by calling Wait (WaitAsync) which subtracts one from the count. Calling release adds one to the count.
You're using async HTTP calls, so limiting the number of threads will not help (nor will ParallelOptions.MaxDegreeOfParallelism in Parallel.ForEach as one of the answers suggests). Even a single thread can initiate all requests and process the results as they arrive.
One way to solve it is to use TPL Dataflow.
Another nice solution is to divide the source IEnumerable into partitions and process items in each partition sequentially as described in this blog post:
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
While the Dataflow library is great, I think it's a bit heavy when not using block composition. I would tend to use something like the extension method below.
Also, unlike the Partitioner method, this runs the async methods on the calling context - the caveat being that if your code is not truly async, or takes a 'fast path', then it will effectively run synchronously since no threads are explicitly created.
public static async Task RunParallelAsync<T>(this IEnumerable<T> items, Func<T, Task> asyncAction, int maxParallel)
{
var tasks = new List<Task>();
foreach (var item in items)
{
tasks.Add(asyncAction(item));
if (tasks.Count < maxParallel)
continue;
var notCompleted = tasks.Where(t => !t.IsCompleted).ToList();
if (notCompleted.Count >= maxParallel)
await Task.WhenAny(notCompleted);
}
await Task.WhenAll(tasks);
}
Almost every SO's answer regarding this topic , states that :
LINQ doesn't work perfectly with async
Also :
I recommend that you not think of this as "using async within LINQ"
But in Stephen's book there is a sample for :
Problem: You have a collection of tasks to await, and you want to do some
processing on each task after it completes. However, you want to do
the processing for each one as soon as it completes, not waiting for
any of the other tasks.
One of the recommended solutions was :
static async Task<int> DelayAndReturnAsync(int val)
{
await Task.Delay(TimeSpan.FromSeconds(val));
return val;
}
// This method now prints "1", "2", and "3".
static async Task ProcessTasksAsync()
{
// Create a sequence of tasks.
Task<int> taskA = DelayAndReturnAsync(2);
Task<int> taskB = DelayAndReturnAsync(3);
Task<int> taskC = DelayAndReturnAsync(1);
var tasks = new[] { taskA, taskB, taskC };
var processingTasks = tasks.Select(async t =>
{
var result = await t;
Trace.WriteLine(result);
}).ToArray();
// Await all processing to complete
await Task.WhenAll(processingTasks);
}
Question #1:
I don't understand why now async inside a LINQ statement - does work . Didn't we just say "don't think about using async within LINQ" ?
Question #2:
When the control reaches the await t here — What is actually happen? Does the control leaves the ProcessTasksAsync method ? or does it leaves the anonymous method and continue the iteration ?
I don't understand why now async inside a LINQ statement - does work . Didn't we just say "don't think about using async within LINQ" ?
async mostly doesn't work with LINQ because IEnumerable<T> extensions don't always infer the delegate type properly and defer to Action<T>. They have no special understanding of the Task class. This means the actual async delegate becomes async void, which is bad. In the case of Enumerable.Select, we have an overload which returns a Func<T> (which in turn will be Func<Task> in our case), which is equivalent to async Task, hence it works fine for async use-cases.
When the control reaches the await t here — What is actually happen? Does the control leaves the ProcessTasksAsync method ?
No, it doesn't. Enumerable.Select is about projecting all elements in the sequence. This means that for each element in the collection, await t which will yield control back to the iterator, which will continue iterating all elements. That's why you later have to await Task.WhenAll, to ensure all elements have finished execution.
Question 1:
The difference is that each task is continued with additional processing which is: Trace.WriteLine(result);. In the link you pointed to, that code does not change anything, just creates overhead of awaiting and wrapping with another task.
Question 2:
When the control reaches the await t here — What is actually happen?
It awaits for the result of ProcessTasksAsync's task, then continue with Trace.WriteLine(result);. We can say that the control leaves the ProcessTasksAsync method when we have the result and the processing is still inside the anonymous method.
At the end, we have await Task.WhenAll(processingTasks); which will await for all tasks including the additional processing (Trace.WriteLine(result);) to complete before continuing but each task does not await for the others to continue executing: Trace.WriteLine(result);
It will be better this way:
static async Task<int> DelayAndReturnAsync(int val)
{
await Task.Delay(TimeSpan.FromSeconds(val));
return val;
}
static async Task AwaitAndProcessAsync(Task<int> task)
{
var result = await task;
Console.WriteLine(result);
}
// This method now prints "1", "2", and "3".
static async Task ProcessTasksAsync()
{
// Create a sequence of tasks.
Task<int> taskA = DelayAndReturnAsync(2);
Task<int> taskB = DelayAndReturnAsync(3);
Task<int> taskC = DelayAndReturnAsync(1);
var tasks = new[] { taskA, taskB, taskC };
var processingTasks = tasks.Select(AwaitAndProcessAsync).ToArray();
// Await all processing to complete
await Task.WhenAll(processingTasks);
}
Array of Task, because AwaitAndProcessAsync returns Task.
I'm needing to create a list of tasks to execute a routine that takes one parameter and then wait for those tasks to complete before continuing with the rest of the program code. Here is an example:
List<Task> tasks = new List<Task>();
foreach (string URL in LIST_URL_COLLECTION)
{
tasks[i] = Task.Factory.StartNew(
GoToURL(URL)
);
}
//wait for them to finish
Console.WriteLine("Done");
I've have googled and searched this site but I just keep hitting a dead end, I did this once but can't remember how.
The Task Parallel Library exposes a convinent way to asynchronously wait for the completion of all tasks via the Task.WhenAll method. The method returns a Task by itself which is awaitable and should be awaited:
public async Task QueryUrlsAsync()
{
var urlFetchingTasks = ListUrlCollection.Select(url => Task.Run(url));
await Task.WhenAll(urlFetchingTasks);
Console.WriteLine("Done");
}
Note that in order to await, your method must be marked with the async modifier in the method signature and return either a Task (if it has no return value) or a Task<T> (if it does have a return value, which type is T).
As a side note, your method looks like it's fetching urls, which i am assuming is generating a web request to some endpoint. In order to do that, there's no need to use extra threads via Task.Factory.StartNew or Task.Run, as these operations are naturally asynchronous. You should look into HttpClient as a starting point. For example, your method could look like this:
public async Task QueryUrlsAsync()
{
var urlFetchingTasks = ListUrlCollection.Select(url =>
{
var httpClient = new HttpClient();
return httpClient.GetAsync(url);
});
await Task.WhenAll(urlFetchingTasks);
Console.WriteLine("Done");
}