Strange Behavior When I Don't Use TaskCreationOptions.LongRunning - c#

I have an engine that has an arbitrary number of pollers which each do their "poll" every few seconds. I want the pollers to run in different threads, but each "poll" within a single poller should be sequential so that one happens after the next. Everything is working using this code to start the polling process:
public void StartPolling()
{
Stopwatch watch = new Stopwatch();
while (Engine.IsRunning)
{
Task task = Task.Factory.StartNew(() =>{
watch.Restart();
Poll();
watch.Stop();
},TaskCreationOptions.LongRunning);
task.Wait();
if(Frequency > watch.Elapsed) Thread.Sleep(Frequency - watch.Elapsed);
}
}
It took me awhile, however, to discover the TaskCreationOptions.LongRunning option which solved a strange problem I was having that I still don't understand.
Without that option, if I run a test that creates 1-3 of these pollers, everything worked fine. If I created 4+ then I ran into strange behavior. Three of the pollers would work, one would just perform one poll, and any remaining would not poll at all.
It makes total sense that my tasks are long running. They are after all running the entire length of my program. But I don't understand why I would get some bad behavior without this option set. Any help would be appreciated.

When you don't use the LongRunning flag, the task is scheduled on a threadpool thread, not its own (dedicated) thread. This is likely the cause of your behavioral change - when you're running without the LongRunning flag in place, you're probably getting threadpool starvation due to other threads in your process.
That being said, your above code doesn't really make a lot of sense. You're starting a dedicated thread (via Task....StartNew with LongRunning) to start a task, then immediately calling task.Wait(), which blocks the current thread. It would be better to just do this sequentially in the current thread:
public void StartPolling()
{
Stopwatch watch = new Stopwatch();
while (Engine.IsRunning)
{
watch.Restart();
Poll();
watch.Stop();
if(Frequency > watch.Elapsed) Thread.Sleep(Frequency - watch.Elapsed);
}
}

TPL (and the traditional ThreadPool) limits the number of threads in the pool (typically a small multiple of the number of CPU cores, usually 2x cores). If you mark a task as LongRunning, it knows that the task won't finish soon and may not subject this task to the threads limit.
Without LongRunning, it assumes that you task will finish quickly (which it doesn't) so it stays within the threads limit. Then if you create more tasks than the threads limit and the running tasks never end, TPL stops all other tasks from running waiting in vain for those running tasks to finish (which they will never do).

Related

Async\await again. Example with network requests

I completely don't understand the applied meaning of async\await.
I just started learning async\await and I know that there are already a huge number of topics. If I understand correctly, then async\await is not needed anywhere else except for operations with a long wait in a thread, if this is not related to a long calculation. For example, database response, network request, file handling. Many people write that async\await is also needed so as not to block the main thread. And here it is completely unclear to me why it should be blocked. Don't block without async\await, just create a task. So I'm trying to create a code that will wait a long time for a response from the network.
I created an example. I see with my own eyes through the windows task manager that the while (i < int.MaxValue) operation is processed first, taking up the entire processor resource, although I first launched the DownloadFile. And only then, when the processor is released, I see that the download files is in progress. On my machine, the example runs ~54 seconds.
Question: how could I first run the DownloadFile asynchronously so that the threads do not idle uselessly, but can do while (i < int.MaxValue)?
using System.Net;
string PathProject = Directory.GetParent(Directory.GetCurrentDirectory()).Parent.Parent.Parent.FullName;
//Create folder 1 in the project folder
DirectoryInfo Path = new DirectoryInfo($"{PathProject}\\1");
int Iterations = Environment.ProcessorCount * 3;
string file = "https://s182vla.storage.yandex.net/rdisk/82b08d86b9920a5e889c6947e4221eb1350374db8d799ee9161395f7195b0b0e/62f75403/geIEA69cusBRNOpxmtup5BdJ7AbRoezTJE9GH4TIzcUe-Cp7uoav-lLks4AknK2SfU_yxi16QmxiuZOGFm-hLQ==?uid=0&filename=004%20-%2002%20Lesnik.mp3&disposition=attachment&hash=e0E3gNC19eqNvFi1rXJjnP1y8SAS38sn5%2ByGEWhnzE5cwAGsEnlbazlMDWSjXpyvq/J6bpmRyOJonT3VoXnDag%3D%3D&limit=0&content_type=audio%2Fmpeg&owner_uid=160716081&fsize=3862987&hid=98984d857027117759bc5ce6092eaa6a&media_type=audio&tknv=v2&rtoken=k9xogU6296eg&force_default=no&ycrid=na-2bc914314062204f1cbf810798018afd-downloader16e&ts=5e61a6daac6c0&s=eef8b08190dc7b22befd6bad89e1393b394869a1668d9b8af3730cce4774e8ad&pb=U2FsdGVkX1__q3AvjJzgzWG4wVR80Oh8XMl-0Dlfyu9FhqAYQVVkoBV0dtBmajpmOkCXKUXPbREOS-MZCxMNu2rkAkKq_n-AXcZ85svtSFs";
List<Task> tasks = new List<Task>();
void MyMethod1(int i)
{
WebClient client = new WebClient();
client.DownloadFile(file, $"{Path}\\{i}.mp3");
}
void MyMethod2()
{
int i = 0;
while (i < int.MaxValue)
{
i++;
}
}
DateTime dateTimeStart = DateTime.Now;
for (int i = 0; i < Iterations; i++)
{
int j = i;
tasks.Add(Task.Run(() => MyMethod1(j)));
}
for (int i = 0; i < Iterations; i++)
{
tasks.Add(Task.Run(() => { MyMethod2(); MyMethod2(); }));
}
Task.WaitAll(tasks.ToArray());
Console.WriteLine(DateTime.Now - dateTimeStart);
while (true)
{
Thread.Sleep(100);
if (Path.GetFiles().Length == Iterations)
{
Thread.Sleep(1000);
foreach (FileInfo f in Path.GetFiles())
{
f.Delete();
}
return;
}
}
If there are 2 web servers that talk to a database and they run on 2 machines with the same spec the web server with async code will be able to handle more concurrent requests.
The following is from 2014's Async Programming : Introduction to Async/Await on ASP.NET
Why Not Increase the Thread Pool Size?
At this point, a question is always asked: Why not just increase the size of the thread pool? The answer is twofold: Asynchronous code scales both further and faster than blocking thread pool threads.
Asynchronous code can scale further than blocking threads because it uses much less memory; every thread pool thread on a modern OS has a 1MB stack, plus an unpageable kernel stack. That doesn’t sound like a lot until you start getting a whole lot of threads on your server. In contrast, the memory overhead for an asynchronous operation is much smaller. So, a request with an asynchronous operation has much less memory pressure than a request with a blocked thread. Asynchronous code allows you to use more of your memory for other things (caching, for example).
Asynchronous code can scale faster than blocking threads because the thread pool has a limited injection rate. As of this writing, the rate is one thread every two seconds. This injection rate limit is a good thing; it avoids constant thread construction and destruction. However, consider what happens when a sudden flood of requests comes in. Synchronous code can easily get bogged down as the requests use up all available threads and the remaining requests have to wait for the thread pool to inject new threads. On the other hand, asynchronous code doesn’t need a limit like this; it’s “always on,” so to speak. Asynchronous code is more responsive to sudden swings in request volume.
(These days threads are added added every 0.5 second)
WebRequest.Create("https://192.168.1.1").GetResponse()
At some point the above code will probably hit the OS method recv(). The OS will suspend your thread until data becomes available. The state of your function, in CPU registers and the thread stack, will be preserved by the OS while the thread is suspended. In the meantime, this thread can't be used for anything else.
If you start that method via Task.Run(), then your method will consume a thread from a thread pool that has been prepared for you by the runtime. Since these threads aren't used for anything else, your program can continue handling other requests on other threads. However, creating a large number of OS threads has significant overheads.
Every OS thread must have some memory reserved for its stack, and the OS must use some memory to store the full state of the CPU for any suspended thread. Switching threads can have a significant performance cost. For maximum performance, you want to keep a small number of threads busy. Rather than having a large number of suspended threads which the OS must keep swapping in and out of each CPU core.
When you use async & await, the C# compiler will transform your method into a coroutine. Ensuring that any state your program needs to remember is no longer stored in CPU registers or on the OS thread stack. Instead all of that state will be stored in heap memory while your task is suspended. When your task is suspended and resumed, only the data which you actually need will be loaded & stored, rather than the entire CPU state.
If you change your code to use .GetResponseAsync(), the runtime will call an OS method that supports overlapped I/O. While your task is suspended, no OS thread will be busy. When data is available, the runtime will continue to execute your task on a thread from the thread pool.
Is this going to impact the program you are writing today? Will you be able to tell the difference? Not until the CPU starts to become the bottleneck. When you are attempting to scale your program to thousands of concurrent requests.
If you are writing new code, look for the Async version of any I/O method. Sprinkle async & await around. It doesn't cost you anything.
If I understand correctly, then async\await is not needed anywhere else except for operations with a long wait in a thread, if this is not related to a long calculation.
It's kind of recursive, but async is best used whenever there's something asynchronous. In other words, anything where the CPU would be wasted if it had to just spin (or block) while waiting for the operation to complete. Operations that are naturally asynchronous are generally I/O-based (as you mention, DB and other network calls, as well as file I/O), but they can be more arbitrary events, too (e.g., timers). Anything where there isn't actual code to run to get the response.
Many people write that async\await is also needed so as not to block the main thread.
At a higher level, there are two primary benefits to async/await, depending on what kind of code you're talking about:
On the server side (e.g., web apps), async/await provides scalability by using fewer threads per request.
On the client side (e.g., UI apps), async/await provides responsiveness by keeping the UI thread free to respond to user input.
Developers tend to emphasize one or the other depending on the kind of work they normally do. So if you see an async article talking about "not blocking the main thread", they're talking about UI apps specifically.
And here it is completely unclear to me why it should be blocked. Don't block without async\await, just create a task.
That works just fine for many situations. But it doesn't work well in others.
E.g., it would be a bad idea to just Task.Run onto a background thread in a web app. The primary benefit of async in a web app is to provide scalability by using fewer threads per request, so using Task.Run does not provide any benefits at all (in fact, scalability is reduced). So, the idea of "use Task.Run instead of async/await" cannot be adopted as a universal principle.
The other problem is in resource-constrained environments, such as mobile devices. You can only have so many threads there before you start running into other problems.
But if you're talking Desktop apps (e.g., WPF and friends), then sure, you can use async/await to free up the UI thread, or you can use Task.Run to free up the UI thread. They both achieve the same goal.
Question: how could I first run the DownloadFile asynchronously so that the threads do not idle uselessly, but can do while (i < int.MaxValue)?
There's nothing in your code that is asynchronous at all. So really, you're dealing with multithreading/parallelism. In general, I recommend using higher-level constructs such as Parallel for parallelism rather than Task.Run.
But regardless of the API used, the underlying problem is that you're kicking off Environment.ProcessorCount * 6 threads. You'll want to ensure that your thread pool is ready for that many threads by calling ThreadPool.SetMinThreads with the workerThreads set to a high enough number.
It's not web requests but here's a toy example:
Test:
n: 1 await: 00:00:00.1373839 sleep: 00:00:00.1195186
n: 10 await: 00:00:00.1290465 sleep: 00:00:00.1086578
n: 100 await: 00:00:00.1101379 sleep: 00:00:00.6517959
n: 300 await: 00:00:00.1207069 sleep: 00:00:02.0564836
n: 500 await: 00:00:00.1211736 sleep: 00:00:02.2742309
n: 1000 await: 00:00:00.1571661 sleep: 00:00:05.3987737
Code:
using System.Diagnostics;
foreach( var n in new []{1, 10, 100, 300, 500, 1000})
{
var sw = Stopwatch.StartNew();
var tasks = Enumerable.Range(0,n)
.Select( i => Task.Run( async () =>
{
await Task.Delay(TimeSpan.FromMilliseconds(100));
}));
await Task.WhenAll(tasks);
var tAwait = sw.Elapsed;
sw = Stopwatch.StartNew();
var tasks2 = Enumerable.Range(0,n)
.Select( i => Task.Run( () =>
{
Thread.Sleep(TimeSpan.FromMilliseconds(100));
}));
await Task.WhenAll(tasks2);
var tSleep = sw.Elapsed;
Console.WriteLine($"n: {n,4} await: {tAwait} sleep: {tSleep}");
}

C# Can Barrier or something similar be used to synchonise tasks multiple times?

I understand a Barrier can be used to have several tasks synchronise their completion before a second phase runs.
I would like to have several tasks synchronise multiple steps like so:
state is 1;
Task1 runs and pauses waiting for state to become 2;
Task2 runs and pauses waiting for state to become 2;
Task2 is final Task and causes the state to progress to state 2;
Task1 runs and pauses waiting for state to become 3;
Task2 runs and pauses waiting for state to become 3;
Task2 is final Task and causes the state to progress to state 3;
state 3 is final state and so all tasks exit.
I know I can spin up new tasks at the end of each state, but since each task does not take too long, I want to avoid creating new tasks for each step.
I can run the above synchronously using for loops, but final state can be 100,000, and so I would like to make use of more than one thread to run the process faster as the process is CPU bound.
I have tried using a counter to keep track of the number of completed Tasks that is incremented by each Task on completion. If the Task is the final Task to complete then it will change the state to the next state. All completed Tasks then wait using while (iterationState == state) await Task.Yield but the performance is terrible and it seems to me a very crude way of doing it.
What is the most efficient way to get the above done? There must be an optimised tool to get this done?
I'm using Parallel.For, creating 300 tasks, and each task needs to run through up to 100,000 states. Each task running through one state completes in less than a second, and creating 300 * 100,000 tasks is a huge overhead that makes running the whole thing synchronously much faster, even if using a single thread.
So I'd like to create 300 Tasks and have these Tasks synchronise moving through the 100,000 states. Hopefully the overhead of creating only 300 tasks instead of 300 * 100,000 tasks, with the overhead of optimised synchronisation between the tasks, will run faster than when doing it synchronously on a single thread.
Each state must complete fully before the next state can be run.
So - what's the optimal synchronisation technique for this scenario? Thanks!
while (iterationState == state) await Task.Yield is indeed a terrible solution to synchronize across your 300 tasks (and no, 300 isn't necessarily super-expensive: you'll only get a reasonable number of threads allocated).
The key problem here isn't the Parallel.For, it's synchronizing across 300 tasks to wait efficiently until each of them have completed a given phase.
The simplest and cleanest solution here would probably be to have a for loop over the stages and a parallel.for over the bit you want parallelized:
for (int stage = 0; stage < 10000; stage++)
{
// the next line blocks until all 300 have completed
// will use thread pool threads as necessary
Parallel.For( ... 300 items to process this stage ... );
}
No extra synchronization primitives needed, no spin-waiting consuming CPU, no needless thrashing between threads trying to see if they are ready to progress.
I think I am understanding what you are trying to do, so here is a suggested way to handle it. Note - I am using Action as the type for the blocking collection, but you can change it to whatever would work best in your scenario.
// Shared variables
CountdownEvent workItemsCompleted = new CountdownEvent(300);
BlockingCollection<Action> workItems = new BlockingCollection<Action>();
CancellationTokenSource cancelSource = new CancellationTokenSource();
// Work Item Queue Thread
for(int i=1; i < stages; ++i)
{
workItemsCompleted.Reset(300);
for(int j=0; j < workItemsForStage[i].Count; ++j)
{
workItems.Add(() => {}) // Add your work item here
}
workItemsCompleted.Wait(token) // token should be passed in from cancelSource.Token
}
// Worker threads that are making use of the queue
// token should be passed to the threads from cancelSource.Token
while(!token.IsCancelled)
{
var item = workItems.Take(token); // Blocks until available item or token is cancelled
item();
workItemsCompleted.Signal();
}
You can use cancelSource from your main thread to cancel the running operations if you need to. In your worker threads you would then need to handle the OperationCancelledException. With this setup you can launch as many worker threads as you need and easily benchmark where you are getting your optimal performance (maybe it is with only using 10 worker threads, etc). Just launch as many workers as you want and then queue up the work items in the Work item queue thread. It's basically a producer-consumer type model except that the producer queues up one phase of the work, then blocks until that phase is done and then queues up the next round of work.

Effects of non-awaited Task

I have a Task which I do not await because I want it to continue its own logic in the background. Part of that logic is to delay 60 seconds and check back in to see if some minute work is to be done. The abbreviate code looks something like this:
public Dictionary<string, Task> taskQueue = new Dictionary<string, Task>();
// Entry point
public void DoMainWork(string workId, XmlDocument workInstructions){
// A work task (i.e. "workInstructions") is actually a plugin which might use its own tasks internally or any other logic it sees fit.
var workTask = Task.Factory.StartNew(() => {
// Main work code that interprets workInstructions
// .........
// .........
// etc.
}, TaskCreationOptions.LongRunning);
// Add the work task to the queue of currently running tasks
taskQueue.Add(workId, workTask);
// Delay a period of time and then see if we need to extend our timeout for doing main work code
this.QueueCheckinOnWorkTask(workId); // Note the non-awaited task
}
private async Task QueueCheckinOnWorkTask(string workId){
DateTime startTime = DateTime.Now;
// Delay 60 seconds
await Task.Delay(60 * 1000).ConfigureAwait(false);
// Find out how long Task.Delay delayed for.
TimeSpan duration = DateTime.Now - startTime; // THIS SOMETIMES DENOTES TIMES MUCH LARGER THAN EXPECTED, I.E. 80+ SECONDS VS. 60
if(!taskQueue.ContainsKey(workId)){
// Do something based on work being complete
}else{
// Work is not complete, inform outside source we're still working
QueueCheckinOnWorkTask(workId); // Note the non-awaited task
}
}
Keep in mind, this is example code just to show a extremely miniminal version of what is going on with my actual program.
My problem is that Task.Delay() is delaying for longer than the time specified. Something is blocking this from continuing in a reasonable timeframe.
Unfortunately I haven't been able to replicate the issue on my development machine and it only happens on the server every couple of days. Lastly, it seems related to the number of work tasks we have running at a time.
What would cause this to delay longer than expected? Additionally, how might one go about debugging this type of situation?
This is a follow up to my other question which did not receive an answer: await Task.Delay() delaying for longer that expected
Most often that happens because of thread pool saturation. You can clearly see its effect with this simple console application (I measure time the same way you are doing, doesn't matter in this case if we use stopwatch or not):
public class Program {
public static void Main() {
for (int j = 0; j < 10; j++)
for (int i = 1; i < 10; i++) {
TestDelay(i * 1000);
}
Console.ReadKey();
}
static async Task TestDelay(int expected) {
var startTime = DateTime.Now;
await Task.Delay(expected).ConfigureAwait(false);
var actual = (int) (DateTime.Now - startTime).TotalMilliseconds;
ThreadPool.GetAvailableThreads(out int aw, out _);
ThreadPool.GetMaxThreads(out int mw, out _);
Console.WriteLine("Thread: {3}, Total threads in pool: {4}, Expected: {0}, Actual: {1}, Diff: {2}", expected, actual, actual - expected, Thread.CurrentThread.ManagedThreadId, mw - aw);
Thread.Sleep(5000);
}
}
This program starts 100 tasks which await Task.Delay for 1-10 seconds, and then use Thread.Sleep for 5 seconds to simulate work on a thread on which continuation runs (this is thread pool thread). It will also output total number of threads in thread pool, so you will see how it increases over time.
If you run it you will see that in almost all cases (except first 8) - actual time after delay is much longer than expected, in some cases 5 times longer (you delayed for 3 seconds but 15 seconds has passed).
That's not because Task.Delay is so imprecise. The reason is continuation after await should be executed on a thread pool thread. Thread pool will not always give you a thread when you request. It can consider that instead of creating new thread - it's better to wait for one of the current busy threads to finish its work. It will wait for a certain time and if no thread became free - it will still create a new thread. If you request 10 thread pool threads at once and none is free, it will wait for Xms and create new one. Now you have 9 requests in queue. Now it will again wait for Xms and create another one. Now you have 8 in queue, and so on. This wait for a thread pool thread to become free is what causes increased delay in this console application (and most likely in your real program) - we keep thread pool threads busy with long Thread.Sleep, and thread pool is saturated.
Some parameters of heuristics used by thread pool are available for you to control. Most influential one is "minumum" number of threads in a pool. Thread pool is expected to always create new thread without delay until total number of threads in a pool reaches configurable "minimum". After that, if you request a thread, it might either still create new one or wait for existing to become free.
So the most straightforward way to remove this delay is to increase minimum number of threads in a pool. For example if you do this:
ThreadPool.GetMinThreads(out int wt, out int ct);
ThreadPool.SetMinThreads(100, ct); // increase min worker threads to 100
All tasks in the example above will complete at the expected time with no additional delay.
This is usually not recommended way to solve this problem though. It's better to avoid performing long running heavy operations on thread pool threads, because thread pool is a global resource and doing this affects your whole application. For example, if we remove Thread.Sleep(5000) in the example above - all tasks will delay for expected amount of time, because all what keeps thread pool thread busy now is Console.WriteLine statement which completes in no time, making this thread available for other work.
So to sum up: identify places where you perform heavy work on thread pool threads and avoid doing that (perform heavy work on separate, non-thread-pool threads instead). Alternatively, you might consider increasing minimum number of threads in a pool to a reasonable amount.

await Task.Delay takes longer than expected

I wrote a multithreaded app which uses async/await extensively. It is supposed to download some stuff at a scheduled time. To achieve that, it uses 'await Task.Delay'. Sometimes it sends thousands requests every minute.
It works as expected, but sometimes my program needs to log something big. When it does, it serializes many objects and saves them to a file. During that time, I noticed that my scheduled tasks are executed too late. I've put all the logging to a separate thread with the lowest priority and the problem doesn't occur that often anymore, but it still happens. The things is, I want to know when it happens and in order to know that I have to use something like that:
var delayTestDate = DateTime.Now;
await Task.Delay(5000);
if((DateTime.Now - delayTestDate).TotalMilliseconds > 6000/*delays up to 1 second are tolerated*/) Console.WriteLine("The task has been delayed!");
Moreover, I have found that 'Task.Run', which I also use, can also cause delays. To monitor that, I have to use even more ugly code:
var delayTestDate = DateTime.Now;
await Task.Run(() =>
{
if((DateTime.Now - delayTestDate).TotalMilliseconds > 1000/*delays up to 1 second are tolerated*/) Console.WriteLine("The task has been delayed!");
//do some stuff
delayTestDate = DateTime.Now;
});
if((DateTime.Now - delayTestDate).TotalMilliseconds > 1000/*delays up to 1 second are tolerated*/) Console.WriteLine("The task has been delayed!");
I have to use it before and after every await and Task.Run and inside every async function, which is ugly and inconvenient. I can't put it into a separate function, since it would have to be async and I would have to await it anyway. Does anybody have an idea of a more elegant solution?
EDIT:
Some information I provided in the comments:
As #YuvalItzchakov noticed, the problem may be caused by Thread Pool starvation. That's why I used System.Threading.Thread to take care of the logging outside of the Thread Pool, but as I said, the problem still sometimes occur.
I have a processor with four cores and by subtracting results of ThreadPool.GetAvailableThreads from ThreadPool.GetMaxThreads I get 0 busy worker threads and 1-2 busy completion port threads. Process.GetCurrentProcess().Threads.Count usually returns about 30. It's a Windows Forms app and although it only has a tray icon with a menu, it starts with 11 threads. When it gets to sending thousands requests per minute, it quickly gets up to 30.
As #Noseratio suggested, I tried to play with ThreadPool.SetMinThreads and ThreadPool.SetMaxThreads, but it didn't even change the numbers of busy threads mentioned above.
When you execute Task.Run it uses Thread Pool threads to execute those tasks. When you have long running tasks, you are causing starvation to the Thread Pool, since its resources are currently occupied with long running tasks.
2 Suggestions:
When running long running tasks, make sure to use Task.Factory.Startnew with TaskCreationOptions.LongRunning, which will trigger a new thread creation. You must be cautious here as well, as spinning too many new threads will cause excessive context switches which will cause your app to slow down
Use true async where you have to do IO Bound work, use apis that support the TAP such as HttpClient and Stream, which wont cause a new thread to execute blocking work.
There are overheads in async/await, as well as the tasks themselves being executed at a lower priority. If you need something to happen reliably at an accurate interval, async/await / TPL is not the interface to use.
Try creating an independent background thread that loops until it is scheduled to do work. This way you can control the priority and timing directly without going through TPL / async.
Thread backgroundThread = new Thread(BackgroundWork);
DateTime nextInterval = DateTime.Now;
public void BackgroundWork()
{
if(DateTime.Now > nextInterval){
DoWork();
nextInterval = nextInterval.Add(new TimeSpan(0,0,0,10)); // 10 seconds
}
Thread.Sleep(100);
}
Adjust the Sleep(..) and interval values as needed.
I think you're experiencing the situation described by Joe Duffy in his "CLR thread pool injection, stuttering problems" blog post:
One silly thing our thread pool currently does has to do with how it
creates new threads. Namely, it severely throttles creation of new
threads once you surpass the “minimum” number of threads, which, by
default, is the number of CPUs on the machine. We limit ourselves to
at most one new thread per 500ms once we reach or surpass this number.
One solution might be to explicitly increase the minimum number of thread pool threads before making any use of TPL, e.g.:
ThreadPool.SetMaxThreads(workerThreads: 200, completionPortThreads: 200);
ThreadPool.SetMinThreads(workerThreads: 100, completionPortThreads: 100);
Try playing with these numbers and see if the problem goes away.

Need a queue of jobs to be processed by threads

I have some work (a job) that is in a queue (so there a several of them) and I want each job to be processed by a thread.
I was looking at Rx but this is not what I wanted and then came across the parallel task library.
Since my work will be done in an web application I do not want client to be waiting for each job to be finished, so I have done the following:
public void FromWebClientRequest(int[] ids);
{
// I will get the objects for the ids from a repository using a container (UNITY)
ThreadPool.QueueUserWorkItem(delegate
{
DoSomeWorkInParallel(ids, container);
});
}
private static void DoSomeWorkInParallel(int[] ids, container)
{
Parallel.ForEach(ids, id=>
{
Some work will be done here...
var respository = container.Resolve...
});
// Here all the work will be done.
container.Resolve<ILogger>().Log("finished all work");
}
I would call the above code on a web request and then the client will not have to wait.
Is this the correct way to do this?
TIA
From the MSDN docs I see that Unitys IContainer Resolve method is not thread safe (or it is not written). This would mean that you need to do that out of the thread loop. Edit: changed to Task.
public void FromWebClientRequest(int[] ids);
{
IRepoType repoType = container.Resolve<IRepoType>();
ILogger logger = container.Resolve<ILogger>();
// remove LongRunning if your operations are not blocking (Ie. read file or download file long running queries etc)
// prefer fairness is here to try to complete first the requests that came first, so client are more likely to be able to be served "first come, first served" in case of high CPU use with lot of requests
Task.Factory.StartNew(() => DoSomeWorkInParallel(ids, repoType, logger), TaskCreationOptions.LongRunning | TaskCreationOptions.PreferFairness);
}
private static void DoSomeWorkInParallel(int[] ids, IRepoType repository, ILogger logger)
{
// if there are blocking operations inside this loop you ought to convert it to tasks with LongRunning
// why this? to force more threads as usually would be used to run the loop, and try to saturate cpu use, which would be doing nothing most of the time
// beware of doing this if you work on a non clustered database, since you can saturate it and have a bottleneck there, you should try and see how it handles your workload
Parallel.ForEach(ids, id=>{
// Some work will be done here...
// use repository
});
logger.Log("finished all work");
}
Plus as fiver stated, if you have .Net 4 then Tasks is the way to go.
Why go Task (question in comment):
If your method fromClientRequest would be fired insanely often, you would fill the thread pool, and overall system performance would probably not be as good as with .Net 4 with fine graining. This is where Task enters the game. Each task is not its own thread but the new .Net 4 thread pool creates enough threads to maximize performance on a system, and you do not need to bother on how many cpus and how much thread context switches would there be.
Some MSDN quotes for ThreadPool:
When all thread pool threads have been
assigned to tasks, the thread pool
does not immediately begin creating
new idle threads. To avoid
unnecessarily allocating stack space
for threads, it creates new idle
threads at intervals. The interval is
currently half a second, although it
could change in future versions of the
.NET Framework.
The thread pool has a default size of
250 worker threads per available
processor
Unnecessarily increasing the number of
idle threads can also cause
performance problems. Stack space must
be allocated for each thread. If too
many tasks start at the same time, all
of them might appear to be slow.
Finding the right balance is a
performance-tuning issue.
By using Tasks you discard those issues.
Another good thing is you can fine grain the type of operation to run. This is important if your tasks do run blocking operations. This is a case where more threads are to be allocated concurrently since they would mostly wait. ThreadPool cannot achieve this automagically:
Task.Factory.StartNew(() => DoSomeWork(), TaskCreationOptions.LongRunning);
And of course you are able to make it finish on demand without resorting to ManualResetEvent:
var task = Task.Factory.StartNew(() => DoSomeWork());
task.Wait();
Beside this you don't have to change the Parallel.ForEach if you don't expect exceptions or blocking, since it is part of the .Net 4 Task Parallel Library, and (often) works well and optimized on the .Net 4 pool as Tasks do.
However if you do go to Tasks instead of parallel for, remove the LongRunning from the caller Task, since Parallel.For is a blocking operations and Starting tasks (with the fiver loop) is not. But this way you loose the kinda first-come-first-served optimization, or you have to do it on a lot more Tasks (all spawned through ids) which probably would give less correct behaviour. Another option is to wait on all tasks at the end of DoSomeWorkInParallel.
Another way is to use Tasks:
public static void FromWebClientRequest(int[] ids)
{
foreach (var id in ids)
{
Task.Factory.StartNew(i =>
{
Wl(i);
}
, id);
}
}
I would call the above code on a web
request and then the client will not
have to wait.
This will work provided the client does not need an answer (like Ok/Fail).
Is this the correct
way to do this?
Almost. You use Parallel.ForEach (TPL) for the jobs but run it from a 'plain' Threadpool job. Better to use a Task for the outer job as well.
Also, handle all exceptions in that outer Task. And be careful about the thread-safety of the container etc.

Categories

Resources