Async\await again. Example with network requests - c#

I completely don't understand the applied meaning of async\await.
I just started learning async\await and I know that there are already a huge number of topics. If I understand correctly, then async\await is not needed anywhere else except for operations with a long wait in a thread, if this is not related to a long calculation. For example, database response, network request, file handling. Many people write that async\await is also needed so as not to block the main thread. And here it is completely unclear to me why it should be blocked. Don't block without async\await, just create a task. So I'm trying to create a code that will wait a long time for a response from the network.
I created an example. I see with my own eyes through the windows task manager that the while (i < int.MaxValue) operation is processed first, taking up the entire processor resource, although I first launched the DownloadFile. And only then, when the processor is released, I see that the download files is in progress. On my machine, the example runs ~54 seconds.
Question: how could I first run the DownloadFile asynchronously so that the threads do not idle uselessly, but can do while (i < int.MaxValue)?
using System.Net;
string PathProject = Directory.GetParent(Directory.GetCurrentDirectory()).Parent.Parent.Parent.FullName;
//Create folder 1 in the project folder
DirectoryInfo Path = new DirectoryInfo($"{PathProject}\\1");
int Iterations = Environment.ProcessorCount * 3;
string file = "https://s182vla.storage.yandex.net/rdisk/82b08d86b9920a5e889c6947e4221eb1350374db8d799ee9161395f7195b0b0e/62f75403/geIEA69cusBRNOpxmtup5BdJ7AbRoezTJE9GH4TIzcUe-Cp7uoav-lLks4AknK2SfU_yxi16QmxiuZOGFm-hLQ==?uid=0&filename=004%20-%2002%20Lesnik.mp3&disposition=attachment&hash=e0E3gNC19eqNvFi1rXJjnP1y8SAS38sn5%2ByGEWhnzE5cwAGsEnlbazlMDWSjXpyvq/J6bpmRyOJonT3VoXnDag%3D%3D&limit=0&content_type=audio%2Fmpeg&owner_uid=160716081&fsize=3862987&hid=98984d857027117759bc5ce6092eaa6a&media_type=audio&tknv=v2&rtoken=k9xogU6296eg&force_default=no&ycrid=na-2bc914314062204f1cbf810798018afd-downloader16e&ts=5e61a6daac6c0&s=eef8b08190dc7b22befd6bad89e1393b394869a1668d9b8af3730cce4774e8ad&pb=U2FsdGVkX1__q3AvjJzgzWG4wVR80Oh8XMl-0Dlfyu9FhqAYQVVkoBV0dtBmajpmOkCXKUXPbREOS-MZCxMNu2rkAkKq_n-AXcZ85svtSFs";
List<Task> tasks = new List<Task>();
void MyMethod1(int i)
{
WebClient client = new WebClient();
client.DownloadFile(file, $"{Path}\\{i}.mp3");
}
void MyMethod2()
{
int i = 0;
while (i < int.MaxValue)
{
i++;
}
}
DateTime dateTimeStart = DateTime.Now;
for (int i = 0; i < Iterations; i++)
{
int j = i;
tasks.Add(Task.Run(() => MyMethod1(j)));
}
for (int i = 0; i < Iterations; i++)
{
tasks.Add(Task.Run(() => { MyMethod2(); MyMethod2(); }));
}
Task.WaitAll(tasks.ToArray());
Console.WriteLine(DateTime.Now - dateTimeStart);
while (true)
{
Thread.Sleep(100);
if (Path.GetFiles().Length == Iterations)
{
Thread.Sleep(1000);
foreach (FileInfo f in Path.GetFiles())
{
f.Delete();
}
return;
}
}

If there are 2 web servers that talk to a database and they run on 2 machines with the same spec the web server with async code will be able to handle more concurrent requests.
The following is from 2014's Async Programming : Introduction to Async/Await on ASP.NET
Why Not Increase the Thread Pool Size?
At this point, a question is always asked: Why not just increase the size of the thread pool? The answer is twofold: Asynchronous code scales both further and faster than blocking thread pool threads.
Asynchronous code can scale further than blocking threads because it uses much less memory; every thread pool thread on a modern OS has a 1MB stack, plus an unpageable kernel stack. That doesn’t sound like a lot until you start getting a whole lot of threads on your server. In contrast, the memory overhead for an asynchronous operation is much smaller. So, a request with an asynchronous operation has much less memory pressure than a request with a blocked thread. Asynchronous code allows you to use more of your memory for other things (caching, for example).
Asynchronous code can scale faster than blocking threads because the thread pool has a limited injection rate. As of this writing, the rate is one thread every two seconds. This injection rate limit is a good thing; it avoids constant thread construction and destruction. However, consider what happens when a sudden flood of requests comes in. Synchronous code can easily get bogged down as the requests use up all available threads and the remaining requests have to wait for the thread pool to inject new threads. On the other hand, asynchronous code doesn’t need a limit like this; it’s “always on,” so to speak. Asynchronous code is more responsive to sudden swings in request volume.
(These days threads are added added every 0.5 second)

WebRequest.Create("https://192.168.1.1").GetResponse()
At some point the above code will probably hit the OS method recv(). The OS will suspend your thread until data becomes available. The state of your function, in CPU registers and the thread stack, will be preserved by the OS while the thread is suspended. In the meantime, this thread can't be used for anything else.
If you start that method via Task.Run(), then your method will consume a thread from a thread pool that has been prepared for you by the runtime. Since these threads aren't used for anything else, your program can continue handling other requests on other threads. However, creating a large number of OS threads has significant overheads.
Every OS thread must have some memory reserved for its stack, and the OS must use some memory to store the full state of the CPU for any suspended thread. Switching threads can have a significant performance cost. For maximum performance, you want to keep a small number of threads busy. Rather than having a large number of suspended threads which the OS must keep swapping in and out of each CPU core.
When you use async & await, the C# compiler will transform your method into a coroutine. Ensuring that any state your program needs to remember is no longer stored in CPU registers or on the OS thread stack. Instead all of that state will be stored in heap memory while your task is suspended. When your task is suspended and resumed, only the data which you actually need will be loaded & stored, rather than the entire CPU state.
If you change your code to use .GetResponseAsync(), the runtime will call an OS method that supports overlapped I/O. While your task is suspended, no OS thread will be busy. When data is available, the runtime will continue to execute your task on a thread from the thread pool.
Is this going to impact the program you are writing today? Will you be able to tell the difference? Not until the CPU starts to become the bottleneck. When you are attempting to scale your program to thousands of concurrent requests.
If you are writing new code, look for the Async version of any I/O method. Sprinkle async & await around. It doesn't cost you anything.

If I understand correctly, then async\await is not needed anywhere else except for operations with a long wait in a thread, if this is not related to a long calculation.
It's kind of recursive, but async is best used whenever there's something asynchronous. In other words, anything where the CPU would be wasted if it had to just spin (or block) while waiting for the operation to complete. Operations that are naturally asynchronous are generally I/O-based (as you mention, DB and other network calls, as well as file I/O), but they can be more arbitrary events, too (e.g., timers). Anything where there isn't actual code to run to get the response.
Many people write that async\await is also needed so as not to block the main thread.
At a higher level, there are two primary benefits to async/await, depending on what kind of code you're talking about:
On the server side (e.g., web apps), async/await provides scalability by using fewer threads per request.
On the client side (e.g., UI apps), async/await provides responsiveness by keeping the UI thread free to respond to user input.
Developers tend to emphasize one or the other depending on the kind of work they normally do. So if you see an async article talking about "not blocking the main thread", they're talking about UI apps specifically.
And here it is completely unclear to me why it should be blocked. Don't block without async\await, just create a task.
That works just fine for many situations. But it doesn't work well in others.
E.g., it would be a bad idea to just Task.Run onto a background thread in a web app. The primary benefit of async in a web app is to provide scalability by using fewer threads per request, so using Task.Run does not provide any benefits at all (in fact, scalability is reduced). So, the idea of "use Task.Run instead of async/await" cannot be adopted as a universal principle.
The other problem is in resource-constrained environments, such as mobile devices. You can only have so many threads there before you start running into other problems.
But if you're talking Desktop apps (e.g., WPF and friends), then sure, you can use async/await to free up the UI thread, or you can use Task.Run to free up the UI thread. They both achieve the same goal.
Question: how could I first run the DownloadFile asynchronously so that the threads do not idle uselessly, but can do while (i < int.MaxValue)?
There's nothing in your code that is asynchronous at all. So really, you're dealing with multithreading/parallelism. In general, I recommend using higher-level constructs such as Parallel for parallelism rather than Task.Run.
But regardless of the API used, the underlying problem is that you're kicking off Environment.ProcessorCount * 6 threads. You'll want to ensure that your thread pool is ready for that many threads by calling ThreadPool.SetMinThreads with the workerThreads set to a high enough number.

It's not web requests but here's a toy example:
Test:
n: 1 await: 00:00:00.1373839 sleep: 00:00:00.1195186
n: 10 await: 00:00:00.1290465 sleep: 00:00:00.1086578
n: 100 await: 00:00:00.1101379 sleep: 00:00:00.6517959
n: 300 await: 00:00:00.1207069 sleep: 00:00:02.0564836
n: 500 await: 00:00:00.1211736 sleep: 00:00:02.2742309
n: 1000 await: 00:00:00.1571661 sleep: 00:00:05.3987737
Code:
using System.Diagnostics;
foreach( var n in new []{1, 10, 100, 300, 500, 1000})
{
var sw = Stopwatch.StartNew();
var tasks = Enumerable.Range(0,n)
.Select( i => Task.Run( async () =>
{
await Task.Delay(TimeSpan.FromMilliseconds(100));
}));
await Task.WhenAll(tasks);
var tAwait = sw.Elapsed;
sw = Stopwatch.StartNew();
var tasks2 = Enumerable.Range(0,n)
.Select( i => Task.Run( () =>
{
Thread.Sleep(TimeSpan.FromMilliseconds(100));
}));
await Task.WhenAll(tasks2);
var tSleep = sw.Elapsed;
Console.WriteLine($"n: {n,4} await: {tAwait} sleep: {tSleep}");
}

Related

How to scale an application with 50000 Simultaneous Tasks

I am working on a project which needs to be able to run (for example) 50,000 tasks simultaneously. Each task will run at some frequency (say 5 minutes) and will be either a url ping or an HTTP GET request. My initial plan was to create thread for each task. I ran a basic test to see if this was possible given available system resources. I ran the following code as a console app:
public class Program
{
public static void Test1()
{
Thread.Sleep(1000000);
}
public static void Main(string[] args)
{
for(int i = 0; i < 50000; i++)
{
Thread t = new Thread(new ThreadStart(Test1));
t.Start();
Console.WriteLine(i);
}
}
}
Unfortunately, though it started very fast, at the 2000 thread mark, the performance was greatly decreased. By 5000, I could count faster than the program could create threads. This makes getting to 50000 seem like it wouldn't be exactly possible. Am I on the right track or should I try something else? Thanks
Many people have the idea that you need to spawn n threads if you want to handle n tasks in parallel. Most of the time a computer is waiting, it is waiting on I/O such as network traffic, disk access, memory transfer for GPU compute, hardware device to complete an operation, etc.
Given this insight, we can see that a viable solution to handling as many tasks in parallel as possible for a given hardware platform is to pipeline work: place work in a queue and process it using as many threads as possible. Usually, this means 1-2 threads per virtual processor.
In C# we can accomplish this with the Task Parallel Library (TPL):
class Program
{
static Task RunAsync(int x)
{
return Task.Delay(10000);
}
static async Task Main(string[] args)
{
var tasks = Enumerable.Range(0, 50000).Select(x => RunAsync());
Console.WriteLine("Waiting for tasks to complete...");
await Task.WhenAll(tasks);
Console.WriteLine("Done");
}
}
This queues 50000 work items, and waits until all 50000 tasks are complete. These tasks only execute on as many threads that are needed. Behind the scenes, a task scheduler examines the pool of work and has threads steal work from the queue when they need a task to execute.
Additional Considerations
With a large upper bound (n=50000) you should be cognizant of memory pressure, garbage collector activity, and other task-related overhead. You should consider the following:
Consider using ValueTask<T> to minimize allocations, especially for synchronous operations
Use ConfigureAwait(false) where possible to reduce context switching
Use CancellationTokenSource and CancellationToken to cancel requests early (e.g. timeout)
Follow best practices
Avoid awaiting inside of a loop where possible
Avoid querying tasks too frequently for completion
Avoid accessing Task<T>.Result before a task is complete to prevent blocking
Avoid deadlocks by using synchronization primitives (mutex, semaphore, condition signal, synclock, etc) as appropriate
Avoid frequent use of Task.Run to create tasks to avoid exhausting the thread pool available to the default task scheduler (this method is usually reserved for compute-bound tasks)

Concurrency without multithreading Async/Await

There is a strong emphasis that async/await is unrelated to multi-threading in most tutorials; that a single thread can dispatch multiple I/O operations and then handle the results as they complete without creating new threads. The concept makes sense but I've never seen that actual behavior in practice.
Take the below example:
static void Main(string[] args)
{
// No Delay
// var tasks = new List<int> { 3, 2, 1 }.Select(x => DelayedResult(x, 0));
// Staggered delay
// var tasks = new List<int> { 3, 2, 1 }.Select(x => DelayedResult(x, x));
// Simultaneous Delay
// var tasks = new List<int> { 3, 2, 1 }.Select(x => DelayedResult(x, 1));
var allTasks = Task.WhenAll(tasks);
allTasks.Wait();
Console.ReadLine();
}
static async Task<T> DelayedResult<T>(T result, int seconds = 0)
{
ThreadPrint("Yield:" + result);
await Task.Delay(TimeSpan.FromSeconds(seconds));
ThreadPrint("Continuation:" + result);
return result;
}
static void ThreadPrint(string message)
{
int threadId = Thread.CurrentThread.ManagedThreadId;
Console.WriteLine("Thread:" + threadId + "|" + message);
}
"No Delay" uses only one thread and executes the continuation immediately as though it were synchronous code. Looks good.
Thread:1|Yield:3
Thread:1|Continuation:3
Thread:1|Yield:2
Thread:1|Continuation:2
Thread:1|Yield:1
Thread:1|Continuation:1
"Staggered Delay" uses two threads. We have left the single-threaded world behind and there are absolutely new threads being created in the thread pool. At least the thread used for processing the continuations is reused and processing occurs in the order completed rather than the order invoked.
Thread:1|Yield:3
Thread:1|Yield:2
Thread:1|Yield:1
Thread:4|Continuation:1
Thread:4|Continuation:2
Thread:4|Continuation:3
"Simultaneous Delay" uses...4 threads! This is no better than regular old multi-threading; in fact, its worse since there is an ugly state machine hiding under the covers in the IL.
Thread:1|Yield:3
Thread:1|Yield:2
Thread:1|Yield:1
Thread:4|Continuation:1
Thread:7|Continuation:3
Thread:5|Continuation:2
Please provide a code example for the "Simultaneous Delay" that only uses one thread. I suspect there isn't one...which begs the question of why the async/await pattern is advertised as unrelated to multi-threading when it clearly either a) uses the ThreadPool and dispatches new threads as necessary or b) in a UI or ASP.NET context, simply deadlocks on a single thread unless you await "all the way up" which just means that the magic additional thread is being handled by the framework (not that it does not exist).
IMHO, async/await is an awesome abstraction for using continuations everywhere for high availability without getting mired in callback hell...but let's not pretend we are somehow dodging multi-threading. What am I missing?
You are forcing the multithreading in the code you posted.
When you await Task.Delay the current thread is freed to acomplish other tasks if the task scheduler decides it must be run asynchronously, in this case after it's released from the three tasks you lock that thread with Task.WhenAll.Wait which is a synchronous function.
Also, when the task scheduler finds the Task.Delay on the tasks it decides the task is going to be long running so it must be executed asynchronously, not synchronously like the No delay case (yes, you also await Task.Delay on the No delay case, but a delay of 0 seconds, the task scheduler is smart enough to distinguish this case).
As all the tasks resume simultaneously the task scheduler finds the first thread occupied so it creates a new thread for the first task resumed, then the next task sees both threads occupied and so on.
Basically you are asking something impossible to the async mechanism, you want the methods to be executed in parallel while being executed in one thread.
Also, async is not announced as unrelated to multithreading, if someone says that then he doesn't understand what async is, in fact, asynchronous implies multithreading but the async mechanism on .net is smart enough to complete some tasks synchronously to ensure the maximum efficiency.
It can be announced as thread efficient as if a thread is waiting for an I/O operation per example, it can be used for other tasks without completely locking that thread doing nothing, take a TcpClient for example which uses a Socket, at the OS level the socket uses completion threads so retaining that thread doing nothing is totally inefficient, or if you want to go more low level, take a disk read/write which uses DMA to transfer data without using the processor, in that case no other thread is needed at all and retaining the thread is a waste of resources.
Just as a fact, take this description from Microsoft when they introduced async:
Visual Studio 2012 introduces a simplified approach, async
programming, that leverages asynchronous support in the .NET Framework
4.5 and the Windows Runtime. The compiler does the difficult work that the developer used to do, and your application retains a logical
structure that resembles synchronous code. As a result, you get all
the advantages of asynchronous programming with a fraction of the
effort.
Also, using async on an UI thread does not lock the thread, that's the benefit, the UI thread will be freed and keep the UI responsive when it's waiting for long tasks, and instead of programming manually the multithreading and synchronization functions the async mechanism takes care of everything for you.

await Task.Delay takes longer than expected

I wrote a multithreaded app which uses async/await extensively. It is supposed to download some stuff at a scheduled time. To achieve that, it uses 'await Task.Delay'. Sometimes it sends thousands requests every minute.
It works as expected, but sometimes my program needs to log something big. When it does, it serializes many objects and saves them to a file. During that time, I noticed that my scheduled tasks are executed too late. I've put all the logging to a separate thread with the lowest priority and the problem doesn't occur that often anymore, but it still happens. The things is, I want to know when it happens and in order to know that I have to use something like that:
var delayTestDate = DateTime.Now;
await Task.Delay(5000);
if((DateTime.Now - delayTestDate).TotalMilliseconds > 6000/*delays up to 1 second are tolerated*/) Console.WriteLine("The task has been delayed!");
Moreover, I have found that 'Task.Run', which I also use, can also cause delays. To monitor that, I have to use even more ugly code:
var delayTestDate = DateTime.Now;
await Task.Run(() =>
{
if((DateTime.Now - delayTestDate).TotalMilliseconds > 1000/*delays up to 1 second are tolerated*/) Console.WriteLine("The task has been delayed!");
//do some stuff
delayTestDate = DateTime.Now;
});
if((DateTime.Now - delayTestDate).TotalMilliseconds > 1000/*delays up to 1 second are tolerated*/) Console.WriteLine("The task has been delayed!");
I have to use it before and after every await and Task.Run and inside every async function, which is ugly and inconvenient. I can't put it into a separate function, since it would have to be async and I would have to await it anyway. Does anybody have an idea of a more elegant solution?
EDIT:
Some information I provided in the comments:
As #YuvalItzchakov noticed, the problem may be caused by Thread Pool starvation. That's why I used System.Threading.Thread to take care of the logging outside of the Thread Pool, but as I said, the problem still sometimes occur.
I have a processor with four cores and by subtracting results of ThreadPool.GetAvailableThreads from ThreadPool.GetMaxThreads I get 0 busy worker threads and 1-2 busy completion port threads. Process.GetCurrentProcess().Threads.Count usually returns about 30. It's a Windows Forms app and although it only has a tray icon with a menu, it starts with 11 threads. When it gets to sending thousands requests per minute, it quickly gets up to 30.
As #Noseratio suggested, I tried to play with ThreadPool.SetMinThreads and ThreadPool.SetMaxThreads, but it didn't even change the numbers of busy threads mentioned above.
When you execute Task.Run it uses Thread Pool threads to execute those tasks. When you have long running tasks, you are causing starvation to the Thread Pool, since its resources are currently occupied with long running tasks.
2 Suggestions:
When running long running tasks, make sure to use Task.Factory.Startnew with TaskCreationOptions.LongRunning, which will trigger a new thread creation. You must be cautious here as well, as spinning too many new threads will cause excessive context switches which will cause your app to slow down
Use true async where you have to do IO Bound work, use apis that support the TAP such as HttpClient and Stream, which wont cause a new thread to execute blocking work.
There are overheads in async/await, as well as the tasks themselves being executed at a lower priority. If you need something to happen reliably at an accurate interval, async/await / TPL is not the interface to use.
Try creating an independent background thread that loops until it is scheduled to do work. This way you can control the priority and timing directly without going through TPL / async.
Thread backgroundThread = new Thread(BackgroundWork);
DateTime nextInterval = DateTime.Now;
public void BackgroundWork()
{
if(DateTime.Now > nextInterval){
DoWork();
nextInterval = nextInterval.Add(new TimeSpan(0,0,0,10)); // 10 seconds
}
Thread.Sleep(100);
}
Adjust the Sleep(..) and interval values as needed.
I think you're experiencing the situation described by Joe Duffy in his "CLR thread pool injection, stuttering problems" blog post:
One silly thing our thread pool currently does has to do with how it
creates new threads. Namely, it severely throttles creation of new
threads once you surpass the “minimum” number of threads, which, by
default, is the number of CPUs on the machine. We limit ourselves to
at most one new thread per 500ms once we reach or surpass this number.
One solution might be to explicitly increase the minimum number of thread pool threads before making any use of TPL, e.g.:
ThreadPool.SetMaxThreads(workerThreads: 200, completionPortThreads: 200);
ThreadPool.SetMinThreads(workerThreads: 100, completionPortThreads: 100);
Try playing with these numbers and see if the problem goes away.

Need a queue of jobs to be processed by threads

I have some work (a job) that is in a queue (so there a several of them) and I want each job to be processed by a thread.
I was looking at Rx but this is not what I wanted and then came across the parallel task library.
Since my work will be done in an web application I do not want client to be waiting for each job to be finished, so I have done the following:
public void FromWebClientRequest(int[] ids);
{
// I will get the objects for the ids from a repository using a container (UNITY)
ThreadPool.QueueUserWorkItem(delegate
{
DoSomeWorkInParallel(ids, container);
});
}
private static void DoSomeWorkInParallel(int[] ids, container)
{
Parallel.ForEach(ids, id=>
{
Some work will be done here...
var respository = container.Resolve...
});
// Here all the work will be done.
container.Resolve<ILogger>().Log("finished all work");
}
I would call the above code on a web request and then the client will not have to wait.
Is this the correct way to do this?
TIA
From the MSDN docs I see that Unitys IContainer Resolve method is not thread safe (or it is not written). This would mean that you need to do that out of the thread loop. Edit: changed to Task.
public void FromWebClientRequest(int[] ids);
{
IRepoType repoType = container.Resolve<IRepoType>();
ILogger logger = container.Resolve<ILogger>();
// remove LongRunning if your operations are not blocking (Ie. read file or download file long running queries etc)
// prefer fairness is here to try to complete first the requests that came first, so client are more likely to be able to be served "first come, first served" in case of high CPU use with lot of requests
Task.Factory.StartNew(() => DoSomeWorkInParallel(ids, repoType, logger), TaskCreationOptions.LongRunning | TaskCreationOptions.PreferFairness);
}
private static void DoSomeWorkInParallel(int[] ids, IRepoType repository, ILogger logger)
{
// if there are blocking operations inside this loop you ought to convert it to tasks with LongRunning
// why this? to force more threads as usually would be used to run the loop, and try to saturate cpu use, which would be doing nothing most of the time
// beware of doing this if you work on a non clustered database, since you can saturate it and have a bottleneck there, you should try and see how it handles your workload
Parallel.ForEach(ids, id=>{
// Some work will be done here...
// use repository
});
logger.Log("finished all work");
}
Plus as fiver stated, if you have .Net 4 then Tasks is the way to go.
Why go Task (question in comment):
If your method fromClientRequest would be fired insanely often, you would fill the thread pool, and overall system performance would probably not be as good as with .Net 4 with fine graining. This is where Task enters the game. Each task is not its own thread but the new .Net 4 thread pool creates enough threads to maximize performance on a system, and you do not need to bother on how many cpus and how much thread context switches would there be.
Some MSDN quotes for ThreadPool:
When all thread pool threads have been
assigned to tasks, the thread pool
does not immediately begin creating
new idle threads. To avoid
unnecessarily allocating stack space
for threads, it creates new idle
threads at intervals. The interval is
currently half a second, although it
could change in future versions of the
.NET Framework.
The thread pool has a default size of
250 worker threads per available
processor
Unnecessarily increasing the number of
idle threads can also cause
performance problems. Stack space must
be allocated for each thread. If too
many tasks start at the same time, all
of them might appear to be slow.
Finding the right balance is a
performance-tuning issue.
By using Tasks you discard those issues.
Another good thing is you can fine grain the type of operation to run. This is important if your tasks do run blocking operations. This is a case where more threads are to be allocated concurrently since they would mostly wait. ThreadPool cannot achieve this automagically:
Task.Factory.StartNew(() => DoSomeWork(), TaskCreationOptions.LongRunning);
And of course you are able to make it finish on demand without resorting to ManualResetEvent:
var task = Task.Factory.StartNew(() => DoSomeWork());
task.Wait();
Beside this you don't have to change the Parallel.ForEach if you don't expect exceptions or blocking, since it is part of the .Net 4 Task Parallel Library, and (often) works well and optimized on the .Net 4 pool as Tasks do.
However if you do go to Tasks instead of parallel for, remove the LongRunning from the caller Task, since Parallel.For is a blocking operations and Starting tasks (with the fiver loop) is not. But this way you loose the kinda first-come-first-served optimization, or you have to do it on a lot more Tasks (all spawned through ids) which probably would give less correct behaviour. Another option is to wait on all tasks at the end of DoSomeWorkInParallel.
Another way is to use Tasks:
public static void FromWebClientRequest(int[] ids)
{
foreach (var id in ids)
{
Task.Factory.StartNew(i =>
{
Wl(i);
}
, id);
}
}
I would call the above code on a web
request and then the client will not
have to wait.
This will work provided the client does not need an answer (like Ok/Fail).
Is this the correct
way to do this?
Almost. You use Parallel.ForEach (TPL) for the jobs but run it from a 'plain' Threadpool job. Better to use a Task for the outer job as well.
Also, handle all exceptions in that outer Task. And be careful about the thread-safety of the container etc.

In .NET is there a thread scheduler for long running threads?

Our scenario is a network scanner.
It connects to a set of hosts and scans them in parallel for a while using low priority background threads.
I want to be able to schedule lots of work but only have any given say ten or whatever number of hosts scanned in parallel. Even if I create my own threads, the many callbacks and other asynchronous goodness uses the ThreadPool and I end up running out of resources. I should look at MonoTorrent...
If I use THE ThreadPool, can I limit my application to some number that will leave enough for the rest of the application to Run smoothly?
Is there a threadpool that I can initialize to n long lived threads?
[Edit]
No one seems to have noticed that I made some comments on some responses so I will add a couple things here.
Threads should be cancellable both
gracefully and forcefully.
Threads should have low priority leaving the GUI responsive.
Threads are long running but in Order(minutes) and not Order(days).
Work for a given target host is basically:
For each test
Probe target (work is done mostly on the target end of an SSH connection)
Compare probe result to expected result (work is done on engine machine)
Prepare results for host
Can someone explain why using SmartThreadPool is marked wit ha negative usefulness?
In .NET 4 you have the integrated Task Parallel Library. When you create a new Task (the new thread abstraction) you can specify a Task to be long running. We have made good experiences with that (long being days rather than minutes or hours).
You can use it in .NET 2 as well but there it's actually an extension, check here.
In VS2010 the Debugging Parallel applications based on Tasks (not threads) has been radically improved. It's advised to use Tasks whenever possible rather than raw threads. Since it lets you handle parallelism in a more object oriented friendly way.
UPDATE
Tasks that are NOT specified as long running, are queued into the thread pool (or any other scheduler for that matter).
But if a task is specified to be long running, it just creates a standalone Thread, no thread pool is involved.
The CLR ThreadPool isn't appropriate for executing long-running tasks: it's for performing short tasks where the cost of creating a thread would be nearly as high as executing the method itself. (Or at least a significant percentage of the time it takes to execute the method.) As you've seen, .NET itself consumes thread pool threads, you can't reserve a block of them for yourself lest you risk starving the runtime.
Scheduling, throttling, and cancelling work is a different matter. There's no other built-in .NET worker-queue thread pool, so you'll have roll your own (managing the threads or BackgroundWorkers yourself) or find a preexisting one (Ami Bar's SmartThreadPool looks promising, though I haven't used it myself).
In your particular case, the best option would not be either threads or the thread pool or Background worker, but the async programming model (BeginXXX, EndXXX) provided by the framework.
The advantages of using the asynchronous model is that the TcpIp stack uses callbacks whenever there is data to read and the callback is automatically run on a thread from the thread pool.
Using the asynchronous model, you can control the number of requests per time interval initiated and also if you want you can initiate all the requests from a lower priority thread while processing the requests on a normal priority thread which means the packets will stay as little as possible in the internal Tcp Queue of the networking stack.
Asynchronous Client Socket Example - MSDN
P.S. For multiple concurrent and long running jobs that don't do allot of computation but mostly wait on IO (network, disk, etc) the better option always is to use a callback mechanism and not threads.
I'd create your own thread manager. In the following simple example a Queue is used to hold waiting threads and a Dictionary is used to hold active threads, keyed by ManagedThreadId. When a thread finishes, it removes itself from the active dictionary and launches another thread via a callback.
You can change the max running thread limit from your UI, and you can pass extra info to the ThreadDone callback for monitoring performance, etc. If a thread fails for say, a network timeout, you can reinsert back into the queue. Add extra control methods to Supervisor for pausing, stopping, etc.
using System;
using System.Collections.Generic;
using System.Threading;
namespace ConsoleApplication1
{
public delegate void CallbackDelegate(int idArg);
class Program
{
static void Main(string[] args)
{
new Supervisor().Run();
Console.WriteLine("Done");
Console.ReadKey();
}
}
class Supervisor
{
Queue<System.Threading.Thread> waitingThreads = new Queue<System.Threading.Thread>();
Dictionary<int, System.Threading.Thread> activeThreads = new Dictionary<int, System.Threading.Thread>();
int maxRunningThreads = 10;
object locker = new object();
volatile bool done;
public void Run()
{
// queue up some threads
for (int i = 0; i < 50; i++)
{
Thread newThread = new Thread(new Worker(ThreadDone).DoWork);
newThread.IsBackground = true;
waitingThreads.Enqueue(newThread);
}
LaunchWaitingThreads();
while (!done) Thread.Sleep(200);
}
// keep starting waiting threads until we max out
void LaunchWaitingThreads()
{
lock (locker)
{
while ((activeThreads.Count < maxRunningThreads) && (waitingThreads.Count > 0))
{
Thread nextThread = waitingThreads.Dequeue();
activeThreads.Add(nextThread.ManagedThreadId, nextThread);
nextThread.Start();
Console.WriteLine("Thread " + nextThread.ManagedThreadId.ToString() + " launched");
}
done = (activeThreads.Count == 0) && (waitingThreads.Count == 0);
}
}
// this is called by each thread when it's done
void ThreadDone(int threadIdArg)
{
lock (locker)
{
// remove thread from active pool
activeThreads.Remove(threadIdArg);
}
Console.WriteLine("Thread " + threadIdArg.ToString() + " finished");
LaunchWaitingThreads(); // this could instead be put in the wait loop at the end of Run()
}
}
class Worker
{
CallbackDelegate callback;
public Worker(CallbackDelegate callbackArg)
{
callback = callbackArg;
}
public void DoWork()
{
System.Threading.Thread.Sleep(new Random().Next(100, 1000));
callback(System.Threading.Thread.CurrentThread.ManagedThreadId);
}
}
}
Use the built-in threadpool. It has good capabilities.
Alternatively you can look at the Smart Thread Pool implementation here or at Extended Thread Pool for a limit on the maximum number of working threads.

Categories

Resources