I have the following method which is returning me a list of stock prices.
var tickers = Ticker.text.splice(',',' ');
var service = new StockService();
var tickerLoadingTasks = new List<Task<IEnumerable<StockePrice>>>();
foreach(var ticker in tickers)
{
var loadTask = service.GetStockPriceFor(ticker,cancellationTokenSource.Token)
tickerLoadingTasks.Add(loadTask);
}
await Task.WhenAll(tickerLoadingTasks);
// code for updating UI
I have above method which returns me list of stock price data. My confusion is that for each value in tickers, is new thread created or what?
GetStockPrice returns me Task of List of the stock prices for each stock.
Assuming that StockService is an Http client, GetStockPriceFor will do an HTTP request. If the code is implemented correctly, the returned Task can be awaited to get the result of the HTTP call. While awaiting the Task, no thread is used.
The idea of async/await and I/O calls is that you can continue using the thread for something else while the operating system waits for the I/O call to finish. Once it's finished, a thread will be used to process the result of the I/O call.
Now remember, this is based upon the assumption you're doing an HTTP call.
This is a huge if since you don't show the actual implementation of the code. It could be that the code fetches the results synchronously and uses Task.FromResult to give you a Task that will immediately complete and not even switch threads.
Related
Say I'm using EF and I query the database for multiple objects:
var person = await Context.People.FindAsync(1);
person.Name = "foo";
var animal = await Context.Animals.FindAsync(2);
animal.Name = "bar";
var hybrid = await Context.Hybrids.FindAsync(3);
hybrid.Name = "godless abomination";
I can see two ways this could execute:
We get stuck at the first await, it gets a person and sets its name, then moves on to the next await (and so on)
The compiler knows that each awaited task can be executed at the same time because they don't depend on each other. So it tries to get person, animal, and hybrid at the same time and then sets the person, animal and hybrid names synchronously.
I assume number 2 is correct, but I'm not sure. Advice for working these sorts of questions out for myself is also very welcome :)
async and await work with worker pool threads to improve responsiveness of code, not parallelism. In a thread-safe implementation you can parallelize multiple calls using Task.WhenAll instead of await, however the big caveat here is that the EF DbContext is not thread safe and will throw when it detects simultaneous access between multiple threads. (It doesn't matter if the tables referenced are related or not)
As a general breakdown of what sort-of happens with awaited async code:
var person = await Context.People.FindAsync(1);
person.Name = "foo";
var animal = await Context.Animals.FindAsync(2);
animal.Name = "bar";
Context.People.FindAsync(1) gets set up to be executed on a worker thread which will be given an execution pointer for a resumption point to resume executing all following code once it is completed since it was awaited.
So a request like this comes in on a Web Request worker thread, which gets to an awaited line, requests a new worker thread to take over, giving it a resumption pointer to the remaining code, then since the call is awaited, the Web Request worker thread knocks off back into the pool to serve more requests while that runs in the background.
When the Context.People.FindAsync() call completes, that worker thread continues the execution, and eventually hits the async Context.Animals.FindAsync(2) call which does the same thing, spawning that call off to a worker thread with a resumption point given it's awaited, and knocks off back into the worker thread pool. The actual thread handoff behaviour can vary depending on the configured synchronization context. (which can be set up to have the resumption code return to the original thread, for example the UI thread in a WinForms project)
The EF DbContext is fine with this since operations against it are only occurring from one thread at any given time. The accidental alternative would be something like this:
var person = Context.People.FindAsync(1);
var animal = Context.Animals.FindAsync(2);
person.Result.Name = "foo";
animal.Result.Name = "bar";
Context.People.FindAsync(1) will spawn off and run on a worker thread, but because it is not awaited, it isn't given an resumption point to call when it completes, it just returns the Task<Person> handle that initiates to represent its execution. This means that the calling thread continues with the Context.Animals.FindAsync(2) call, again handing off to a Worker Thread and getting a Task<Animal> back. Both of these worker threads will be running in parallel and the EF DbContext is not going to like that at all. Each Task will block on the .Result reference to wait for the task to complete, but often async calls are used where we don't care about the result which leads to silent errors creeping in. For instance some times there are multiple async calls with a bit of synchronous work happening in-between that always seems to take long enough that an un-awaited async call never seems to overlap... until one day it does in production.
Where you would want parallel execution you would opt instead for something like a WhenAll():
using(var contextOne = new AppDbContext());
using(var contextTwo = new AppDbContext());
var personTask = contextOne.People.FindAsync(1);
var animalTask = contextTwo.Animals.FindAsync(2);
await Task.WhenAll(personTask, animalTask);
person.Result.Name = "foo";
animal.Result.Name = "bar";
contextOne.SaveChanges();
contextTwo.SaveChanges();
The key differences here is that we need to ensure that the queries are run against different DbContext instances. Something like this can be used to parallelize DB operations with other long-running tasks (file management, etc.) When it comes to handling exceptions this would bubble up the first exception (if any was raised) from any of the awaited WhenAll tasks. If you want access potential exceptions from all parallel tasks, (AggregateException) that involves a bit of work to play nice with async/await without blocking.
see: Why doesn't await on Task.WhenAll throw an AggregateException? if you want to dive deeper down that rabbit hole. :)
Answer
Your code is going to stop at each await until the specific asynchronous function it proceeds returns. This means your code is going to run from top to bottom (i.e. number 1 is correct)
Example
await longAsyncFunc()
shortFunc()
Above, shortFunc() must wait until the longAsyncFunc() returns because of the await keyword.
Below, longAsyncFunc() will start execution, then shortFunc() will start execution -- no need to wait for longAsyncFunc() to finish its computation and return.
longAsyncFunc()
shortFunc()
Suggestion
If you would rather your code work like number 2 then I would wrap each pair of statements in an async function.
async funcA() {
var person = await Context.People.FindAsync(1);
person.Name = "foo";
return person;
}
async funcB() {
var animal = await Context.Animals.FindAsync(2);
animal.Name = "bar";
return animal;
}
async funcC() {
var hybrid = await Context.Hybrids.FindAsync(3);
hybrid.Name = "godless abomination";
return hybrid;
}
var person = funcA();
var animal = funcB();
var hybrid = funcC();
Notice that I didn't use await on the last three lines -- that would change the behavior back to number 1.
I tried reading many articles and questions in stackoverflow regarding the real use of async/await, so basically asynchronous method calls but somehow I am still not able to decode of how does it provide parallelism and non blocking behavior. I referred few posts like these
Is it OK to use async/await almost everywhere?
https://news.ycombinator.com/item?id=19010989
Benefits of using async and await keywords
So if I write a piece of code like this
var user = await GetUserFromDBAsync();
var destination = await GetDestinationFromDBAsync();
var address = await GetAddressFromDBAsync();
Even though all the three methods are asynchronous but still the code will not go to the second line to get destination from database until it fully gets the user from the database.
So where is the parallelism and non blocking behavior of asyn/await here. It still waits to complete the first operation before executing the next line.
Or my total understanding is wrong about asyn?
EDIT
Any example would really help!
The point of async/await is not that methods are executed more quickly. Rather, it's about what a thread is doing while those methods are waiting for a response from a database, the file system, an HTTP request, or some other I/O.
Without asynchronous execution the thread just waits. It is, in a sense, wasted, because during that time it is doing nothing. We don't have an unlimited supply of threads, so having threads sit and wait is wasteful.
Async/await simply allows threads to do other things. Instead of waiting, the thread can serve some other request. And then, when the database query is finished or the HTTP request receives a response, the next available thread picks up execution of that code.
So yes, the individual lines in your example still execute in sequence. They just execute more efficiently. If your application is receiving many requests, it can process those requests sooner because more threads are available to do work instead of blocking, just waiting for a response from some I/O operation.
I highly recommend this blog post: There Is No Thread. There is some confusion that async/await is about executing something on another thread. It is not about that at all. It's about ensuring that no thread is sitting and waiting when it could be doing something else.
You can execute them in parallel/concurrently and still await them in non-blocking manner withTask.WhenAll. You don't have to await each single async method call individually.
So you have the performance gain and at the same time a responsive UI:
//create 3 "cold" tasks, that are not yet running
var userTask = GetUserFromDBAsync();
var destinationTask = GetDestinationFromDBAsync();
var addressTask = GetAddressFromDBAsync();
//start running and awaiting all of them at (almost) the same time
await Task.WhenAll(userTask, destinationTask, adressTask);
//get the cached results
var user = userTask.Result;
var destination = destinationTask.Result;
var address = addressTask.Result;
I have an ASP.NET 5 Web API application which contains a method that takes objects from a List<T> and makes HTTP requests to a server, 5 at a time, until all requests have completed. This is accomplished using a SemaphoreSlim, a List<Task>(), and awaiting on Task.WhenAll(), similar to the example snippet below:
public async Task<ResponseObj[]> DoStuff(List<Input> inputData)
{
const int maxDegreeOfParallelism = 5;
var tasks = new List<Task<ResponseObj>>();
using var throttler = new SemaphoreSlim(maxDegreeOfParallelism);
foreach (var input in inputData)
{
tasks.Add(ExecHttpRequestAsync(input, throttler));
}
List<ResponseObj> resposnes = await Task.WhenAll(tasks).ConfigureAwait(false);
return responses;
}
private async Task<ResponseObj> ExecHttpRequestAsync(Input input, SemaphoreSlim throttler)
{
await throttler.WaitAsync().ConfigureAwait(false);
try
{
using var request = new HttpRequestMessage(HttpMethod.Post, "https://foo.bar/api");
request.Content = new StringContent(JsonConvert.SerializeObject(input, Encoding.UTF8, "application/json");
var response = await HttpClientWrapper.SendAsync(request).ConfigureAwait(false);
var responseBody = await response.Content.ReadAsStringAsync().ConfigureAwait(false);
var responseObject = JsonConvert.DeserializeObject<ResponseObj>(responseBody);
return responseObject;
}
finally
{
throttler.Release();
}
}
This works well, however I am looking to limit the total number of Tasks that are being executed in parallel globally throughout the application, so as to allow scaling up of this application. For example, if 50 requests to my API came in at the same time, this would start at most 250 tasks running parallel. If I wanted to limit the total number of Tasks that are being executed at any given time to say 100, is it possible to accomplish this? Perhaps via a Queue<T>? Would the framework automatically prevent too many tasks from being executed? Or am I approaching this problem in the wrong way, and would I instead need to Queue the incoming requests to my application?
I'm going to assume the code is fixed, i.e., Task.Run is removed and the WaitAsync / Release are adjusted to throttle the HTTP calls instead of List<T>.Add.
I am looking to limit the total number of Tasks that are being executed in parallel globally throughout the application, so as to allow scaling up of this application.
This does not make sense to me. Limiting your tasks limits your scaling up.
For example, if 50 requests to my API came in at the same time, this would start at most 250 tasks running parallel.
Concurrently, sure, but not in parallel. It's important to note that these aren't 250 threads, and that they're not 250 CPU-bound operations waiting for free thread pool threads to run on, either. These are Promise Tasks, not Delegate Tasks, so they don't "run" on a thread at all. It's just 250 objects in memory.
If I wanted to limit the total number of Tasks that are being executed at any given time to say 100, is it possible to accomplish this?
Since (these kinds of) tasks are just in-memory objects, there should be no need to limit them, any more than you would need to limit the number of strings or List<T>s. Apply throttling where you do need it; e.g., number of HTTP calls done simultaneously per request. Or per host.
Would the framework automatically prevent too many tasks from being executed?
The framework has nothing like this built-in.
Perhaps via a Queue? Or am I approaching this problem in the wrong way, and would I instead need to Queue the incoming requests to my application?
There's already a queue of requests. It's handled by IIS (or whatever your host is). If your server gets too busy (or gets busy very suddenly), the requests will queue up without you having to do anything.
If I wanted to limit the total number of Tasks that are being executed at any given time to say 100, is it possible to accomplish this?
What you are looking for is to limit the MaximumConcurrencyLevel of what's called the Task Scheduler. You can create your own task scheduler that regulates the MaximumCongruencyLevel of the tasks it manages. I would recommend implementing a queue-like object that tracks incoming requests and currently working requests and waits for the current requests to finish before consuming more. The below information may still be relevant.
The task scheduler is in charge of how Tasks are prioritized, and in charge of tracking the tasks and ensuring that their work is completed, at least eventually.
The way it does this is actually very similar to what you mentioned, in general the way the Task Scheduler handles tasks is in a FIFO (First in first out) model very similar to how a ConcurrentQueue<T> works (at least starting in .NET 4).
Would the framework automatically prevent too many tasks from being executed?
By default the TaskScheduler that is created with most applications appears to default to a MaximumConcurrencyLevel of int.MaxValue. So theoretically yes.
The fact that there practically is no limit to the amount of tasks(at least with the default TaskScheduler) might not be that big of a deal for your case scenario.
Tasks are separated into two types, at least when it comes to how they are assigned to the available thread pools. They're separated into Local and Global queues.
Without going too far into detail, the way it works is if a task creates other tasks, those new tasks are part of the parent tasks queue (a local queue). Tasks spawned by a parent task are limited to the parent's thread pool.(Unless the task scheduler takes it upon itself to move queues around)
If a task isn't created by another task, it's a top-level task and is placed into the Global Queue. These would normally be assigned their own thread(if available) and if one isn't available it's treated in a FIFO model, as mentioned above, until it's work can be completed.
This is important because although you can limit the amount of concurrency that happens with the TaskScheduler, it may not necessarily be important - if for say you have a top-level task that's marked as long running and is in-charge of processing your incoming requests. This would be helpful since all the tasks spawned by this top-level task will be part of that task's local queue and therefor won't spam all your available threads in your thread pool.
When you have a bunch of items and you want to process them asynchronously and with limited concurrency, the SemaphoreSlim is a great tool for this job. There are two ways that it can be used. One way is to create all the tasks immediately and have each task acquire the semaphore before doing it's main work, and the other is to throttle the creation of the tasks while the source is enumerated. The first technique is eager, and so it consumes more RAM, but it's more maintainable because it is easier to understand and implement. The second technique is lazy, and it's more efficient if you have millions of items to process.
The technique that you have used in your sample code is the second (lazy) one.
Here is an example of using two SemaphoreSlims in order to impose two maximum concurrency policies, one per request and one globally. First the eager approach:
private const int maxConcurrencyGlobal = 100;
private static SemaphoreSlim globalThrottler
= new SemaphoreSlim(maxConcurrencyGlobal, maxConcurrencyGlobal);
public async Task<ResponseObj[]> DoStuffAsync(IEnumerable<Input> inputData)
{
const int maxConcurrencyPerRequest = 5;
var perRequestThrottler
= new SemaphoreSlim(maxConcurrencyPerRequest, maxConcurrencyPerRequest);
Task<ResponseObj>[] tasks = inputData.Select(async input =>
{
await perRequestThrottler.WaitAsync();
try
{
await globalThrottler.WaitAsync();
try
{
return await ExecHttpRequestAsync(input);
}
finally { globalThrottler.Release(); }
}
finally { perRequestThrottler.Release(); }
}).ToArray();
return await Task.WhenAll(tasks);
}
The Select LINQ operator provides an easy and intuitive way to project items to tasks.
And here is the lazy approach for doing exactly the same thing:
private const int maxConcurrencyGlobal = 100;
private static SemaphoreSlim globalThrottler
= new SemaphoreSlim(maxConcurrencyGlobal, maxConcurrencyGlobal);
public async Task<ResponseObj[]> DoStuffAsync(IEnumerable<Input> inputData)
{
const int maxConcurrencyPerRequest = 5;
var perRequestThrottler
= new SemaphoreSlim(maxConcurrencyPerRequest, maxConcurrencyPerRequest);
var tasks = new List<Task<ResponseObj>>();
foreach (var input in inputData)
{
await perRequestThrottler.WaitAsync();
await globalThrottler.WaitAsync();
Task<ResponseObj> task = Run(async () =>
{
try
{
return await ExecHttpRequestAsync(input);
}
finally
{
try { globalThrottler.Release(); }
finally { perRequestThrottler.Release(); }
}
});
tasks.Add(task);
}
return await Task.WhenAll(tasks);
static async Task<T> Run<T>(Func<Task<T>> action) => await action();
}
This implementation assumes that the await globalThrottler.WaitAsync() will never throw, which is a given according to the documentation. This will no longer be the case if you decide later to add support for cancellation, and you pass a CancellationToken to the method. In that case you would need one more try/finally wrapper around the task-creation logic. The first (eager) approach could be enhanced with cancellation support without such considerations. Its existing try/finally infrastructure is
already sufficient.
It is also important that the internal helper Run method is implemented with async/await. Eliding the async/await would be an easy mistake to make, because in that case any exception thrown synchronously by the ExecHttpRequestAsync method would be rethrown immediately, and it would not be encapsulated in a Task<ResponseObj>. Then the task returned by the DoStuffAsync method would fail without releasing the acquired semaphores, and also without awaiting the completion of the already started operations. That's another argument for preferring the eager approach. The lazy approach has too many gotchas to watch for.
I have a UrlList of only 4 URLs which I want to use to make 4 concurrent requests. Does the code below truly make 4 requests which start at the same time?
My testing appears to show that it does, but am I correct in thinking that there will actually be 4 requests retrieving data from the URL target at the same time or does it just appear that way?
static void Main(string[] args)
{
var t = Do_TaskWhenAll();
t.Wait();
}
public static async Task Do_TaskWhenAll()
{
var downloadTasksQuery = from url in UrlList select Run(url);
var downloadTasks = downloadTasksQuery.ToArray();
Results = await Task.WhenAll(downloadTasks);
}
public static async Task<string> Run(string url)
{
var client = new WebClient();
AddHeaders(client);
var content = await client.DownloadStringTaskAsync(new Uri(url));
return content;
}
Correct, when ToArray is called, the enumerable downloadTasksQuery will yield a task for every URL, running your web requests concurrently.
await Task.WhenAll ensures your task completes only when all web requests have completed.
You can rewrite your code to be less verbose, while doing effectively the same, like so:
public static async Task Do_TaskWhenAll()
{
var downloadTasks = from url in UrlList select Run(url);
Results = await Task.WhenAll(downloadTasks);
}
There's no need for ToArray because Task.WhenAll will enumerate your enumerable for you.
I advice you to use HttpClient instead of WebClient. Using HttpClient, you won't have to create a new instance of the client for each concurrent request, as it allows you to reuse the same client for doing multiple requests, concurrently.
The short answer is yes: if you generate multiple Tasks without awaiting each one individually, they can run simultaneously, as long as they are truly asynchronous.
When DownloadStringTaskAsync is awaited, a Task is returned from your Run method, allowing the next iteration to occur whilst waiting for the response.
So the next HTTP request is allowed to be sent without waiting for the first to complete.
As an aside, your method can be written more concisely:
public static async Task Do_TaskWhenAll()
{
Results = await Task.WhenAll(UrlList.Select(Run));
}
Task.WhenAll has an overload that accepts IEnumerable<Task<TResult>> which is returned from UrlList.Select(Run).
No, there is no guarantee that your requests will be executed in parallel, or immediately.
Starting a task merely queues it to the thread pool. If all of the pool's threads are occupied, that task will necessarily wait until a thread frees up.
In your case, since there are a relatively large number of threads available in the pool, and you are queueing only a small number of items, the pool has no problem servicing them as they come in. The more tasks you queue at once, the more likely this is to change.
If you truly need concurrency, you need to be aware of what the thread pool size is, and how busy it is. The ThreadPool class will help you to manage this.
I want to know when some parallel tasks are completed.
I'm using this code to make between 1500 and 2000 small WebClient.DownloadString with a 10 seconds HttpRequest Timeout on a website:
Task.Factory.StartNew(() =>
Parallel.ForEach<string>(myKeywords, new ParallelOptions
{ MaxDegreeOfParallelism = 5 }, getKey));
Sometimes, a query fails, so that there are exceptions and the function never finish, and the UI refresh inside each getKey function sometimes seems to be called twice, so I cannot get an accurate idea about how many tasks are completed. I'm calculating: Number of UI refresh calls / total number of keywords, and get a result between 100% and 250%, and I never know when task are completed. I search in a lot of SO discussion but none was a direct method or a method that suits my needs. So I guess Framework 4.0 doesn't provides any Tasks.AllCompleted Event Handler or similar workaround?
Should I run my Parallel.Foreach in one other thread instead of my UI thread then add it?
myTasks.WaitAll
[EDIT]
A temporary solution was to copy my list of string in a ArrayList, then removing one by one each item from the list at the beginning of each query. Whenever the function worked well or not, I know when all items have been processed.
Parallel.ForEach is no different than other loops when it comes to handling exceptions. If an exception is thrown, then it is going to stop processing of the loop. This is probably why you're seeing variances in the percentages (I assume you might be processing the count as you're processing the loop).
Also, you don't really need Parallel.ForEach becuase the asynchronous calls that you're making on the WebClient class are going to block waiting on IO completion (the network responses), they are not computationally bound (Parallel.ForEach is much better when you are computationally bound).
That said, you should first translate your calls to WebClient to use Task<TResult>. Translating the event-based asynchronous pattern to the task-based asynchronous pattern is simple with the use of the TaskCompletionSource<TResult> class.
Assuming that you have a sequence of Uri instances that are produced as a result of your calls to getKey, you can create a function to do this:
static Task<String> DownloadStringAsync(Uri uri)
{
// Create a WebClient
var wc = new WebClient();
// Set up your web client.
// Create the TaskCompletionSource.
var tcs = new TaskCompletionSource<string>();
// Set the event handler on the web client.
wc.DownloadStringCompleted += (s, e) => {
// Dispose of the WebClient when done.
using (wc)
{
// Set the task completion source based on the
// event.
if (e.Cancelled)
{
// Set cancellation.
tcs.SetCancelled();
return;
}
// Exception?
if (e.Error != null)
{
// Set exception.
tcs.SetException(e.Error);
return;
}
// Set result.
tcs.SetResult(e.Result);
};
// Return the task.
return tcs.Task;
};
Note, the above can be optimized to use one WebClient, that is left as an exercise for you (assuming your tests show you need it).
From there, you can get a sequence of Task<string>:
// Gotten from myKeywords
IEnumerable<Uri> uris = ...;
// The tasks.
Task<string>[] tasks = uris.Select(DownloadStringAsync).ToArray();
Note that you must call the ToArray extension method in order for the tasks to start running. This is to get around deferred execution. You don't have to call ToArray, but you must call something which will enumerate through the entire list and cause the tasks to start running.
Once you have these Task<string> instances, you can wait on them all to complete by calling the ContinueWhenAll<TAntecedentResult> method on the TaskFactory class, like so:
Task.Factory.ContinueWhenAll(tasks, a => { }).Wait();
When this is done, you can cycle through the tasks array and look at the Exception and/or Result properties to check to see what the exception or result was.
If you are updating a user interface, then you should look at intercepting the call to Enumerable.Select, namely, you should call the ContinueWith<TNewResult> method on the Task<TResult> to perform an operation when that download is complete, like so:
// The tasks.
Task<string>[] tasks = uris.
Select(DownloadStringAsync).
// Select receives a Task<T> here, continue that.
Select(t => t.ContinueWith(t2 => {
// Do something here:
// - increment a count
// - fire an event
// - update the UI
// Note that you have to take care of synchronization here, so
// make sure to synchronize access to a count, or serialize calls
// to the UI thread appropriately with a SynchronizationContext.
...
// Return the result, this ensures that you'll have a Task<string>
// waiting.
return t2;
})).
ToArray();
This will allow you to update things as they happen. Note that in the above case, if you call Select again, you might want to check the state of t2 and fire some other events, depending on what you want your error handling mechanism to be.