Need a queue of jobs to be processed by threads - c#

I have some work (a job) that is in a queue (so there a several of them) and I want each job to be processed by a thread.
I was looking at Rx but this is not what I wanted and then came across the parallel task library.
Since my work will be done in an web application I do not want client to be waiting for each job to be finished, so I have done the following:
public void FromWebClientRequest(int[] ids);
{
// I will get the objects for the ids from a repository using a container (UNITY)
ThreadPool.QueueUserWorkItem(delegate
{
DoSomeWorkInParallel(ids, container);
});
}
private static void DoSomeWorkInParallel(int[] ids, container)
{
Parallel.ForEach(ids, id=>
{
Some work will be done here...
var respository = container.Resolve...
});
// Here all the work will be done.
container.Resolve<ILogger>().Log("finished all work");
}
I would call the above code on a web request and then the client will not have to wait.
Is this the correct way to do this?
TIA

From the MSDN docs I see that Unitys IContainer Resolve method is not thread safe (or it is not written). This would mean that you need to do that out of the thread loop. Edit: changed to Task.
public void FromWebClientRequest(int[] ids);
{
IRepoType repoType = container.Resolve<IRepoType>();
ILogger logger = container.Resolve<ILogger>();
// remove LongRunning if your operations are not blocking (Ie. read file or download file long running queries etc)
// prefer fairness is here to try to complete first the requests that came first, so client are more likely to be able to be served "first come, first served" in case of high CPU use with lot of requests
Task.Factory.StartNew(() => DoSomeWorkInParallel(ids, repoType, logger), TaskCreationOptions.LongRunning | TaskCreationOptions.PreferFairness);
}
private static void DoSomeWorkInParallel(int[] ids, IRepoType repository, ILogger logger)
{
// if there are blocking operations inside this loop you ought to convert it to tasks with LongRunning
// why this? to force more threads as usually would be used to run the loop, and try to saturate cpu use, which would be doing nothing most of the time
// beware of doing this if you work on a non clustered database, since you can saturate it and have a bottleneck there, you should try and see how it handles your workload
Parallel.ForEach(ids, id=>{
// Some work will be done here...
// use repository
});
logger.Log("finished all work");
}
Plus as fiver stated, if you have .Net 4 then Tasks is the way to go.
Why go Task (question in comment):
If your method fromClientRequest would be fired insanely often, you would fill the thread pool, and overall system performance would probably not be as good as with .Net 4 with fine graining. This is where Task enters the game. Each task is not its own thread but the new .Net 4 thread pool creates enough threads to maximize performance on a system, and you do not need to bother on how many cpus and how much thread context switches would there be.
Some MSDN quotes for ThreadPool:
When all thread pool threads have been
assigned to tasks, the thread pool
does not immediately begin creating
new idle threads. To avoid
unnecessarily allocating stack space
for threads, it creates new idle
threads at intervals. The interval is
currently half a second, although it
could change in future versions of the
.NET Framework.
The thread pool has a default size of
250 worker threads per available
processor
Unnecessarily increasing the number of
idle threads can also cause
performance problems. Stack space must
be allocated for each thread. If too
many tasks start at the same time, all
of them might appear to be slow.
Finding the right balance is a
performance-tuning issue.
By using Tasks you discard those issues.
Another good thing is you can fine grain the type of operation to run. This is important if your tasks do run blocking operations. This is a case where more threads are to be allocated concurrently since they would mostly wait. ThreadPool cannot achieve this automagically:
Task.Factory.StartNew(() => DoSomeWork(), TaskCreationOptions.LongRunning);
And of course you are able to make it finish on demand without resorting to ManualResetEvent:
var task = Task.Factory.StartNew(() => DoSomeWork());
task.Wait();
Beside this you don't have to change the Parallel.ForEach if you don't expect exceptions or blocking, since it is part of the .Net 4 Task Parallel Library, and (often) works well and optimized on the .Net 4 pool as Tasks do.
However if you do go to Tasks instead of parallel for, remove the LongRunning from the caller Task, since Parallel.For is a blocking operations and Starting tasks (with the fiver loop) is not. But this way you loose the kinda first-come-first-served optimization, or you have to do it on a lot more Tasks (all spawned through ids) which probably would give less correct behaviour. Another option is to wait on all tasks at the end of DoSomeWorkInParallel.

Another way is to use Tasks:
public static void FromWebClientRequest(int[] ids)
{
foreach (var id in ids)
{
Task.Factory.StartNew(i =>
{
Wl(i);
}
, id);
}
}

I would call the above code on a web
request and then the client will not
have to wait.
This will work provided the client does not need an answer (like Ok/Fail).
Is this the correct
way to do this?
Almost. You use Parallel.ForEach (TPL) for the jobs but run it from a 'plain' Threadpool job. Better to use a Task for the outer job as well.
Also, handle all exceptions in that outer Task. And be careful about the thread-safety of the container etc.

Related

Async\await again. Example with network requests

I completely don't understand the applied meaning of async\await.
I just started learning async\await and I know that there are already a huge number of topics. If I understand correctly, then async\await is not needed anywhere else except for operations with a long wait in a thread, if this is not related to a long calculation. For example, database response, network request, file handling. Many people write that async\await is also needed so as not to block the main thread. And here it is completely unclear to me why it should be blocked. Don't block without async\await, just create a task. So I'm trying to create a code that will wait a long time for a response from the network.
I created an example. I see with my own eyes through the windows task manager that the while (i < int.MaxValue) operation is processed first, taking up the entire processor resource, although I first launched the DownloadFile. And only then, when the processor is released, I see that the download files is in progress. On my machine, the example runs ~54 seconds.
Question: how could I first run the DownloadFile asynchronously so that the threads do not idle uselessly, but can do while (i < int.MaxValue)?
using System.Net;
string PathProject = Directory.GetParent(Directory.GetCurrentDirectory()).Parent.Parent.Parent.FullName;
//Create folder 1 in the project folder
DirectoryInfo Path = new DirectoryInfo($"{PathProject}\\1");
int Iterations = Environment.ProcessorCount * 3;
string file = "https://s182vla.storage.yandex.net/rdisk/82b08d86b9920a5e889c6947e4221eb1350374db8d799ee9161395f7195b0b0e/62f75403/geIEA69cusBRNOpxmtup5BdJ7AbRoezTJE9GH4TIzcUe-Cp7uoav-lLks4AknK2SfU_yxi16QmxiuZOGFm-hLQ==?uid=0&filename=004%20-%2002%20Lesnik.mp3&disposition=attachment&hash=e0E3gNC19eqNvFi1rXJjnP1y8SAS38sn5%2ByGEWhnzE5cwAGsEnlbazlMDWSjXpyvq/J6bpmRyOJonT3VoXnDag%3D%3D&limit=0&content_type=audio%2Fmpeg&owner_uid=160716081&fsize=3862987&hid=98984d857027117759bc5ce6092eaa6a&media_type=audio&tknv=v2&rtoken=k9xogU6296eg&force_default=no&ycrid=na-2bc914314062204f1cbf810798018afd-downloader16e&ts=5e61a6daac6c0&s=eef8b08190dc7b22befd6bad89e1393b394869a1668d9b8af3730cce4774e8ad&pb=U2FsdGVkX1__q3AvjJzgzWG4wVR80Oh8XMl-0Dlfyu9FhqAYQVVkoBV0dtBmajpmOkCXKUXPbREOS-MZCxMNu2rkAkKq_n-AXcZ85svtSFs";
List<Task> tasks = new List<Task>();
void MyMethod1(int i)
{
WebClient client = new WebClient();
client.DownloadFile(file, $"{Path}\\{i}.mp3");
}
void MyMethod2()
{
int i = 0;
while (i < int.MaxValue)
{
i++;
}
}
DateTime dateTimeStart = DateTime.Now;
for (int i = 0; i < Iterations; i++)
{
int j = i;
tasks.Add(Task.Run(() => MyMethod1(j)));
}
for (int i = 0; i < Iterations; i++)
{
tasks.Add(Task.Run(() => { MyMethod2(); MyMethod2(); }));
}
Task.WaitAll(tasks.ToArray());
Console.WriteLine(DateTime.Now - dateTimeStart);
while (true)
{
Thread.Sleep(100);
if (Path.GetFiles().Length == Iterations)
{
Thread.Sleep(1000);
foreach (FileInfo f in Path.GetFiles())
{
f.Delete();
}
return;
}
}
If there are 2 web servers that talk to a database and they run on 2 machines with the same spec the web server with async code will be able to handle more concurrent requests.
The following is from 2014's Async Programming : Introduction to Async/Await on ASP.NET
Why Not Increase the Thread Pool Size?
At this point, a question is always asked: Why not just increase the size of the thread pool? The answer is twofold: Asynchronous code scales both further and faster than blocking thread pool threads.
Asynchronous code can scale further than blocking threads because it uses much less memory; every thread pool thread on a modern OS has a 1MB stack, plus an unpageable kernel stack. That doesn’t sound like a lot until you start getting a whole lot of threads on your server. In contrast, the memory overhead for an asynchronous operation is much smaller. So, a request with an asynchronous operation has much less memory pressure than a request with a blocked thread. Asynchronous code allows you to use more of your memory for other things (caching, for example).
Asynchronous code can scale faster than blocking threads because the thread pool has a limited injection rate. As of this writing, the rate is one thread every two seconds. This injection rate limit is a good thing; it avoids constant thread construction and destruction. However, consider what happens when a sudden flood of requests comes in. Synchronous code can easily get bogged down as the requests use up all available threads and the remaining requests have to wait for the thread pool to inject new threads. On the other hand, asynchronous code doesn’t need a limit like this; it’s “always on,” so to speak. Asynchronous code is more responsive to sudden swings in request volume.
(These days threads are added added every 0.5 second)
WebRequest.Create("https://192.168.1.1").GetResponse()
At some point the above code will probably hit the OS method recv(). The OS will suspend your thread until data becomes available. The state of your function, in CPU registers and the thread stack, will be preserved by the OS while the thread is suspended. In the meantime, this thread can't be used for anything else.
If you start that method via Task.Run(), then your method will consume a thread from a thread pool that has been prepared for you by the runtime. Since these threads aren't used for anything else, your program can continue handling other requests on other threads. However, creating a large number of OS threads has significant overheads.
Every OS thread must have some memory reserved for its stack, and the OS must use some memory to store the full state of the CPU for any suspended thread. Switching threads can have a significant performance cost. For maximum performance, you want to keep a small number of threads busy. Rather than having a large number of suspended threads which the OS must keep swapping in and out of each CPU core.
When you use async & await, the C# compiler will transform your method into a coroutine. Ensuring that any state your program needs to remember is no longer stored in CPU registers or on the OS thread stack. Instead all of that state will be stored in heap memory while your task is suspended. When your task is suspended and resumed, only the data which you actually need will be loaded & stored, rather than the entire CPU state.
If you change your code to use .GetResponseAsync(), the runtime will call an OS method that supports overlapped I/O. While your task is suspended, no OS thread will be busy. When data is available, the runtime will continue to execute your task on a thread from the thread pool.
Is this going to impact the program you are writing today? Will you be able to tell the difference? Not until the CPU starts to become the bottleneck. When you are attempting to scale your program to thousands of concurrent requests.
If you are writing new code, look for the Async version of any I/O method. Sprinkle async & await around. It doesn't cost you anything.
If I understand correctly, then async\await is not needed anywhere else except for operations with a long wait in a thread, if this is not related to a long calculation.
It's kind of recursive, but async is best used whenever there's something asynchronous. In other words, anything where the CPU would be wasted if it had to just spin (or block) while waiting for the operation to complete. Operations that are naturally asynchronous are generally I/O-based (as you mention, DB and other network calls, as well as file I/O), but they can be more arbitrary events, too (e.g., timers). Anything where there isn't actual code to run to get the response.
Many people write that async\await is also needed so as not to block the main thread.
At a higher level, there are two primary benefits to async/await, depending on what kind of code you're talking about:
On the server side (e.g., web apps), async/await provides scalability by using fewer threads per request.
On the client side (e.g., UI apps), async/await provides responsiveness by keeping the UI thread free to respond to user input.
Developers tend to emphasize one or the other depending on the kind of work they normally do. So if you see an async article talking about "not blocking the main thread", they're talking about UI apps specifically.
And here it is completely unclear to me why it should be blocked. Don't block without async\await, just create a task.
That works just fine for many situations. But it doesn't work well in others.
E.g., it would be a bad idea to just Task.Run onto a background thread in a web app. The primary benefit of async in a web app is to provide scalability by using fewer threads per request, so using Task.Run does not provide any benefits at all (in fact, scalability is reduced). So, the idea of "use Task.Run instead of async/await" cannot be adopted as a universal principle.
The other problem is in resource-constrained environments, such as mobile devices. You can only have so many threads there before you start running into other problems.
But if you're talking Desktop apps (e.g., WPF and friends), then sure, you can use async/await to free up the UI thread, or you can use Task.Run to free up the UI thread. They both achieve the same goal.
Question: how could I first run the DownloadFile asynchronously so that the threads do not idle uselessly, but can do while (i < int.MaxValue)?
There's nothing in your code that is asynchronous at all. So really, you're dealing with multithreading/parallelism. In general, I recommend using higher-level constructs such as Parallel for parallelism rather than Task.Run.
But regardless of the API used, the underlying problem is that you're kicking off Environment.ProcessorCount * 6 threads. You'll want to ensure that your thread pool is ready for that many threads by calling ThreadPool.SetMinThreads with the workerThreads set to a high enough number.
It's not web requests but here's a toy example:
Test:
n: 1 await: 00:00:00.1373839 sleep: 00:00:00.1195186
n: 10 await: 00:00:00.1290465 sleep: 00:00:00.1086578
n: 100 await: 00:00:00.1101379 sleep: 00:00:00.6517959
n: 300 await: 00:00:00.1207069 sleep: 00:00:02.0564836
n: 500 await: 00:00:00.1211736 sleep: 00:00:02.2742309
n: 1000 await: 00:00:00.1571661 sleep: 00:00:05.3987737
Code:
using System.Diagnostics;
foreach( var n in new []{1, 10, 100, 300, 500, 1000})
{
var sw = Stopwatch.StartNew();
var tasks = Enumerable.Range(0,n)
.Select( i => Task.Run( async () =>
{
await Task.Delay(TimeSpan.FromMilliseconds(100));
}));
await Task.WhenAll(tasks);
var tAwait = sw.Elapsed;
sw = Stopwatch.StartNew();
var tasks2 = Enumerable.Range(0,n)
.Select( i => Task.Run( () =>
{
Thread.Sleep(TimeSpan.FromMilliseconds(100));
}));
await Task.WhenAll(tasks2);
var tSleep = sw.Elapsed;
Console.WriteLine($"n: {n,4} await: {tAwait} sleep: {tSleep}");
}

Is there a way to limit the number of parallel Tasks globally in an ASP.NET Web API application?

I have an ASP.NET 5 Web API application which contains a method that takes objects from a List<T> and makes HTTP requests to a server, 5 at a time, until all requests have completed. This is accomplished using a SemaphoreSlim, a List<Task>(), and awaiting on Task.WhenAll(), similar to the example snippet below:
public async Task<ResponseObj[]> DoStuff(List<Input> inputData)
{
const int maxDegreeOfParallelism = 5;
var tasks = new List<Task<ResponseObj>>();
using var throttler = new SemaphoreSlim(maxDegreeOfParallelism);
foreach (var input in inputData)
{
tasks.Add(ExecHttpRequestAsync(input, throttler));
}
List<ResponseObj> resposnes = await Task.WhenAll(tasks).ConfigureAwait(false);
return responses;
}
private async Task<ResponseObj> ExecHttpRequestAsync(Input input, SemaphoreSlim throttler)
{
await throttler.WaitAsync().ConfigureAwait(false);
try
{
using var request = new HttpRequestMessage(HttpMethod.Post, "https://foo.bar/api");
request.Content = new StringContent(JsonConvert.SerializeObject(input, Encoding.UTF8, "application/json");
var response = await HttpClientWrapper.SendAsync(request).ConfigureAwait(false);
var responseBody = await response.Content.ReadAsStringAsync().ConfigureAwait(false);
var responseObject = JsonConvert.DeserializeObject<ResponseObj>(responseBody);
return responseObject;
}
finally
{
throttler.Release();
}
}
This works well, however I am looking to limit the total number of Tasks that are being executed in parallel globally throughout the application, so as to allow scaling up of this application. For example, if 50 requests to my API came in at the same time, this would start at most 250 tasks running parallel. If I wanted to limit the total number of Tasks that are being executed at any given time to say 100, is it possible to accomplish this? Perhaps via a Queue<T>? Would the framework automatically prevent too many tasks from being executed? Or am I approaching this problem in the wrong way, and would I instead need to Queue the incoming requests to my application?
I'm going to assume the code is fixed, i.e., Task.Run is removed and the WaitAsync / Release are adjusted to throttle the HTTP calls instead of List<T>.Add.
I am looking to limit the total number of Tasks that are being executed in parallel globally throughout the application, so as to allow scaling up of this application.
This does not make sense to me. Limiting your tasks limits your scaling up.
For example, if 50 requests to my API came in at the same time, this would start at most 250 tasks running parallel.
Concurrently, sure, but not in parallel. It's important to note that these aren't 250 threads, and that they're not 250 CPU-bound operations waiting for free thread pool threads to run on, either. These are Promise Tasks, not Delegate Tasks, so they don't "run" on a thread at all. It's just 250 objects in memory.
If I wanted to limit the total number of Tasks that are being executed at any given time to say 100, is it possible to accomplish this?
Since (these kinds of) tasks are just in-memory objects, there should be no need to limit them, any more than you would need to limit the number of strings or List<T>s. Apply throttling where you do need it; e.g., number of HTTP calls done simultaneously per request. Or per host.
Would the framework automatically prevent too many tasks from being executed?
The framework has nothing like this built-in.
Perhaps via a Queue? Or am I approaching this problem in the wrong way, and would I instead need to Queue the incoming requests to my application?
There's already a queue of requests. It's handled by IIS (or whatever your host is). If your server gets too busy (or gets busy very suddenly), the requests will queue up without you having to do anything.
If I wanted to limit the total number of Tasks that are being executed at any given time to say 100, is it possible to accomplish this?
What you are looking for is to limit the MaximumConcurrencyLevel of what's called the Task Scheduler. You can create your own task scheduler that regulates the MaximumCongruencyLevel of the tasks it manages. I would recommend implementing a queue-like object that tracks incoming requests and currently working requests and waits for the current requests to finish before consuming more. The below information may still be relevant.
The task scheduler is in charge of how Tasks are prioritized, and in charge of tracking the tasks and ensuring that their work is completed, at least eventually.
The way it does this is actually very similar to what you mentioned, in general the way the Task Scheduler handles tasks is in a FIFO (First in first out) model very similar to how a ConcurrentQueue<T> works (at least starting in .NET 4).
Would the framework automatically prevent too many tasks from being executed?
By default the TaskScheduler that is created with most applications appears to default to a MaximumConcurrencyLevel of int.MaxValue. So theoretically yes.
The fact that there practically is no limit to the amount of tasks(at least with the default TaskScheduler) might not be that big of a deal for your case scenario.
Tasks are separated into two types, at least when it comes to how they are assigned to the available thread pools. They're separated into Local and Global queues.
Without going too far into detail, the way it works is if a task creates other tasks, those new tasks are part of the parent tasks queue (a local queue). Tasks spawned by a parent task are limited to the parent's thread pool.(Unless the task scheduler takes it upon itself to move queues around)
If a task isn't created by another task, it's a top-level task and is placed into the Global Queue. These would normally be assigned their own thread(if available) and if one isn't available it's treated in a FIFO model, as mentioned above, until it's work can be completed.
This is important because although you can limit the amount of concurrency that happens with the TaskScheduler, it may not necessarily be important - if for say you have a top-level task that's marked as long running and is in-charge of processing your incoming requests. This would be helpful since all the tasks spawned by this top-level task will be part of that task's local queue and therefor won't spam all your available threads in your thread pool.
When you have a bunch of items and you want to process them asynchronously and with limited concurrency, the SemaphoreSlim is a great tool for this job. There are two ways that it can be used. One way is to create all the tasks immediately and have each task acquire the semaphore before doing it's main work, and the other is to throttle the creation of the tasks while the source is enumerated. The first technique is eager, and so it consumes more RAM, but it's more maintainable because it is easier to understand and implement. The second technique is lazy, and it's more efficient if you have millions of items to process.
The technique that you have used in your sample code is the second (lazy) one.
Here is an example of using two SemaphoreSlims in order to impose two maximum concurrency policies, one per request and one globally. First the eager approach:
private const int maxConcurrencyGlobal = 100;
private static SemaphoreSlim globalThrottler
= new SemaphoreSlim(maxConcurrencyGlobal, maxConcurrencyGlobal);
public async Task<ResponseObj[]> DoStuffAsync(IEnumerable<Input> inputData)
{
const int maxConcurrencyPerRequest = 5;
var perRequestThrottler
= new SemaphoreSlim(maxConcurrencyPerRequest, maxConcurrencyPerRequest);
Task<ResponseObj>[] tasks = inputData.Select(async input =>
{
await perRequestThrottler.WaitAsync();
try
{
await globalThrottler.WaitAsync();
try
{
return await ExecHttpRequestAsync(input);
}
finally { globalThrottler.Release(); }
}
finally { perRequestThrottler.Release(); }
}).ToArray();
return await Task.WhenAll(tasks);
}
The Select LINQ operator provides an easy and intuitive way to project items to tasks.
And here is the lazy approach for doing exactly the same thing:
private const int maxConcurrencyGlobal = 100;
private static SemaphoreSlim globalThrottler
= new SemaphoreSlim(maxConcurrencyGlobal, maxConcurrencyGlobal);
public async Task<ResponseObj[]> DoStuffAsync(IEnumerable<Input> inputData)
{
const int maxConcurrencyPerRequest = 5;
var perRequestThrottler
= new SemaphoreSlim(maxConcurrencyPerRequest, maxConcurrencyPerRequest);
var tasks = new List<Task<ResponseObj>>();
foreach (var input in inputData)
{
await perRequestThrottler.WaitAsync();
await globalThrottler.WaitAsync();
Task<ResponseObj> task = Run(async () =>
{
try
{
return await ExecHttpRequestAsync(input);
}
finally
{
try { globalThrottler.Release(); }
finally { perRequestThrottler.Release(); }
}
});
tasks.Add(task);
}
return await Task.WhenAll(tasks);
static async Task<T> Run<T>(Func<Task<T>> action) => await action();
}
This implementation assumes that the await globalThrottler.WaitAsync() will never throw, which is a given according to the documentation. This will no longer be the case if you decide later to add support for cancellation, and you pass a CancellationToken to the method. In that case you would need one more try/finally wrapper around the task-creation logic. The first (eager) approach could be enhanced with cancellation support without such considerations. Its existing try/finally infrastructure is
already sufficient.
It is also important that the internal helper Run method is implemented with async/await. Eliding the async/await would be an easy mistake to make, because in that case any exception thrown synchronously by the ExecHttpRequestAsync method would be rethrown immediately, and it would not be encapsulated in a Task<ResponseObj>. Then the task returned by the DoStuffAsync method would fail without releasing the acquired semaphores, and also without awaiting the completion of the already started operations. That's another argument for preferring the eager approach. The lazy approach has too many gotchas to watch for.

How to scale an application with 50000 Simultaneous Tasks

I am working on a project which needs to be able to run (for example) 50,000 tasks simultaneously. Each task will run at some frequency (say 5 minutes) and will be either a url ping or an HTTP GET request. My initial plan was to create thread for each task. I ran a basic test to see if this was possible given available system resources. I ran the following code as a console app:
public class Program
{
public static void Test1()
{
Thread.Sleep(1000000);
}
public static void Main(string[] args)
{
for(int i = 0; i < 50000; i++)
{
Thread t = new Thread(new ThreadStart(Test1));
t.Start();
Console.WriteLine(i);
}
}
}
Unfortunately, though it started very fast, at the 2000 thread mark, the performance was greatly decreased. By 5000, I could count faster than the program could create threads. This makes getting to 50000 seem like it wouldn't be exactly possible. Am I on the right track or should I try something else? Thanks
Many people have the idea that you need to spawn n threads if you want to handle n tasks in parallel. Most of the time a computer is waiting, it is waiting on I/O such as network traffic, disk access, memory transfer for GPU compute, hardware device to complete an operation, etc.
Given this insight, we can see that a viable solution to handling as many tasks in parallel as possible for a given hardware platform is to pipeline work: place work in a queue and process it using as many threads as possible. Usually, this means 1-2 threads per virtual processor.
In C# we can accomplish this with the Task Parallel Library (TPL):
class Program
{
static Task RunAsync(int x)
{
return Task.Delay(10000);
}
static async Task Main(string[] args)
{
var tasks = Enumerable.Range(0, 50000).Select(x => RunAsync());
Console.WriteLine("Waiting for tasks to complete...");
await Task.WhenAll(tasks);
Console.WriteLine("Done");
}
}
This queues 50000 work items, and waits until all 50000 tasks are complete. These tasks only execute on as many threads that are needed. Behind the scenes, a task scheduler examines the pool of work and has threads steal work from the queue when they need a task to execute.
Additional Considerations
With a large upper bound (n=50000) you should be cognizant of memory pressure, garbage collector activity, and other task-related overhead. You should consider the following:
Consider using ValueTask<T> to minimize allocations, especially for synchronous operations
Use ConfigureAwait(false) where possible to reduce context switching
Use CancellationTokenSource and CancellationToken to cancel requests early (e.g. timeout)
Follow best practices
Avoid awaiting inside of a loop where possible
Avoid querying tasks too frequently for completion
Avoid accessing Task<T>.Result before a task is complete to prevent blocking
Avoid deadlocks by using synchronization primitives (mutex, semaphore, condition signal, synclock, etc) as appropriate
Avoid frequent use of Task.Run to create tasks to avoid exhausting the thread pool available to the default task scheduler (this method is usually reserved for compute-bound tasks)

Does calling WaitOne using a Semaphore release the calling thread to perform other work?

My application needs to perform a number of tasks per tenant on a minute-to-minute basis. These are fire-and-forget operations, so I don't want to use Parallel.ForEach to handle this.
Instead I'm looping through the list of tenants, and firing off a ThreadPool.QueueUserWorkItem to process each tenants task.
foreach (Tenant tenant in tenants)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(ProcessTenant), tenantAccount);
}
This code works perfectly in production, and can generally process over 100 tenants in under 5 seconds.
However on application startup this causes 100% CPU utilization while things like EF get warmed up during the startup process. To limit this I've implemented a semaphore as follows:
private static Semaphore _threadLimiter = new Semaphore(4, 4);
The idea is to limit this task processing to only be able to use half of the machines logical processors. Inside the ProcessTenant method I call:
try
{
_threadLimiter.WaitOne();
// Perform all minute-to-minute tasks
}
finally
{
_threadLimiter.Release();
}
In testing, this appears to work exactly as expected. CPU utilization on startup stays at around 50% and does not appear to affect how quickly initial startup takes.
So question is mainly around what is actually happening when WaitOne is called. Does this release the thread to work on other tasks - similar to asynchronous calls? The MSDN documentation states that WaitOne: "Blocks the current thread until the current WaitHandle receives a signal."
So I'm just wary that this won't actually allow my web app to continue to utilize this blocked thread while it's waiting, which would make the whole point of this exercise meaningless.
WaitOne does block the thread, and that thread will stop being scheduled on a CPU core until the semaphore is signaled. However, you're holding a large number of threads from the threadpool for possibly a long time ("long" as in "longer than ~500 ms"). This can be an issue because the threadpool grows very slowly, so you may be preventing other part of your application from properly using it.
If you plan on waiting for a significant amount of time, you could use your own threads instead:
foreach (Tenant tenant in tenants)
{
new Thread(ProcessTenant).Start(tenantAccount);
}
However, you're still keeping one thread per item in memory. While they won't eat CPU as they're sleeping on the semaphore, they're still using RAM for nothing (about 1MB per thread). Instead, have a single dedicated thread wait on the semaphore and enqueue new items as needed:
// Run this on a dedicated thread
foreach (Tenant tenant in tenants)
{
_threadLimiter.WaitOne();
ThreadPool.QueueUserWorkItem(_ =>
{
try
{
ProcessTenant(tenantAccount);
}
finally
{
_threadLimiter.Release();
}
});
}

What do I do with async Tasks I don't want to wait for?

I am writing a multi player game server and am looking at ways the new C# async/await features can
help me. The core of the server is a loop which updates all the actors in the game as fast as it
can:
while (!shutdown)
{
foreach (var actor in actors)
actor.Update();
// Send and receive pending network messages
// Various other system maintenance
}
This loop is required to handle thousands of actors and update multiple times per second to keep the
game running smoothly. Some actors occasionally perform slow tasks in their update functions, such
as fetching data from a database, which is where I'd like to use async. Once this data is retrieved
the actor wants to update the game state, which must be done on the main thread.
As this is a console application, I plan to write a SynchronizationContext which can dispatch
pending delegates to the main loop. This allows those tasks to update the game once they complete
and lets unhandled exceptions be thrown into the main loop. My question is, how do write the async
update functions? This works very nicely, but breaks the recommendations not to use async void:
Thing foo;
public override void Update()
{
foo.DoThings();
if (someCondition) {
UpdateAsync();
}
}
async void UpdateAsync()
{
// Get data, but let the server continue in the mean time
var newFoo = await GetFooFromDatabase();
// Now back on the main thread, update game state
this.foo = newFoo;
}
I could make Update() async and propogate the tasks back to the main loop, but:
I don't want to add overhead to the thousands of updates that will never use it.
Even in the main loop I don't want to await the tasks and block the loop.
Awaiting the task would cause a deadlock anyway as it needs to complete on the awaiting thread.
What do I do with all these tasks I can't await? The only time I might want to know they've all
finished is when I'm shutting the server down, but I don't want to collect every task generated by
potentially weeks worth of updates.
My understanding is that the crux of it is that you want:
while (!shutdown)
{
//This should happen immediately and completions occur on the main thread.
foreach (var actor in actors)
actor.Update(); //includes i/o bound database operations
// The subsequent code should not be delayed
...
}
Where the while loop is running in your main console thread. This is a tight single-threaded loop. You could run the foreach in parallel, but then you would still be waiting for the longest running instance (the i/o bound operation to get the data from the database).
await async is not the best option within this loop, you need to run these i/o database tasks on a thread pool. On the thread pool async await would be useful to free up pool threads.
So, the next question is how to get these completions back to your main thread. Well, it seems like you need something equivalent to a message pump on your main thread. See this post for information on how to do that, though that may be a bit heavy handed. You could just have a completion queue of sorts that you check on the main thread in each pass through your while Loop. You would use one of the concurrent data structures to do this so that it is all thread safe then set Foo if it needs to be set.
It seems that there is some room to rationalise this polling of actors and threading, but without knowing the details of the app it is hard to say.
A couple of points: -
If you do not have a Wait higher up on a task, your main console thread will exit and so will your application. See here for details.
As you have pointed out, await async does not block the current thread, but it does mean that the code subsequent to the await will only execute on completion of the await.
The completion may or may not be completed on the calling thread. You have already mentioned Synchronization Context, so I won't go into the details.
Synchronization Context is null on a Console app. See here for information.
Async isn't really for fire-and-forget type operations.
For fire and forget you can use one of these options depending on your scenario:
Use Task.Run or Task.StartNew. See here for differences.
Use a producer/consumer type pattern for the long running scenarios running under your own threadpool.
Be aware of the following: -
That you will need to handle the exceptions in your spawned tasks / threads. If there are any exceptions that you do not observe, you may want to handle these, even just to log their occurence. See the information on unobserved exceptions.
If your process dies while these long running tasks are on the queue or starting they will not be run, so you may want some kind of persistence mechanism (database, external queue, file) that keeps track of the state of these operations.
If you want to know about the state of these tasks, then you will need to keep track of them in some way, whether it is an in memory list, or by querying the queues for your own thread pool or by querying the persistence mechanism. The nice thing about the persistence mechanism is that it is resilient to crashes and during shutdown you could just close down immediately, then pick up where you ended up when you restart (this of course depends on how critical it is that the tasks are run within a certain timeframe).
First, I recommend that you do not use your own SynchronizationContext; I have one available as part of my AsyncEx library that I commonly use for Console apps.
As far as your update methods go, they should return Task. My AsyncEx library has a number of "task constants" that are useful when you have a method that might be asynchronous:
public override Task Update() // Note: not "async"
{
foo.DoThings();
if (someCondition) {
return UpdateAsync();
}
else {
return TaskConstants.Completed;
}
}
async Task UpdateAsync()
{
// Get data, but let the server continue in the mean time
var newFoo = await GetFooFromDatabase();
// Now back on the main thread, update game state
this.foo = newFoo;
}
Returning to your main loop, the solution there isn't quite as clear. If you want every actor to complete before continuing to the next actor, then you can do this:
AsyncContext.Run(async () =>
{
while (!shutdown)
{
foreach (var actor in actors)
await actor.Update();
...
}
});
Alternatively, if you want to start all actors simultaneously and wait for them all to complete before moving to the next "tick", you can do this:
AsyncContext.Run(async () =>
{
while (!shutdown)
{
await Task.WhenAll(actors.Select(actor => actor.Update()));
...
}
});
When I say "simultaneously" above, it is actually starting each actor in order, and since they all execute on the main thread (including the async continuations), there's no actual simultaneous behavior; each "chuck of code" will execute on the same thread.
I highly recommend watching this video or just taking a look at the slides:
Three Essential Tips for Using Async in Microsoft Visual C# and Visual Basic
From my understanding what you should probably be doing in this scenario is returning Task<Thing> in UpdateAsync and possibly even Update.
If you are performing some async operations with 'foo' outside the main loop what happens when the async part completes during a future sequential update? I believe you really want to wait on all your update tasks to complete and then swap your internal state over in one go.
Ideally you would start all the slow (database) updates first and then do the other faster ones so that the entire set is ready as soon as possible.

Categories

Resources