How to scale an application with 50000 Simultaneous Tasks

How to scale an application with 50000 Simultaneous Tasks - c#

I am working on a project which needs to be able to run (for example) 50,000 tasks simultaneously. Each task will run at some frequency (say 5 minutes) and will be either a url ping or an HTTP GET request. My initial plan was to create thread for each task. I ran a basic test to see if this was possible given available system resources. I ran the following code as a console app:
public class Program
{
public static void Test1()
{
Thread.Sleep(1000000);
}
public static void Main(string[] args)
{
for(int i = 0; i < 50000; i++)
{
Thread t = new Thread(new ThreadStart(Test1));
t.Start();
Console.WriteLine(i);
}
}
}
Unfortunately, though it started very fast, at the 2000 thread mark, the performance was greatly decreased. By 5000, I could count faster than the program could create threads. This makes getting to 50000 seem like it wouldn't be exactly possible. Am I on the right track or should I try something else? Thanks

Many people have the idea that you need to spawn n threads if you want to handle n tasks in parallel. Most of the time a computer is waiting, it is waiting on I/O such as network traffic, disk access, memory transfer for GPU compute, hardware device to complete an operation, etc.
Given this insight, we can see that a viable solution to handling as many tasks in parallel as possible for a given hardware platform is to pipeline work: place work in a queue and process it using as many threads as possible. Usually, this means 1-2 threads per virtual processor.
In C# we can accomplish this with the Task Parallel Library (TPL):
class Program
{
static Task RunAsync(int x)
{
return Task.Delay(10000);
}
static async Task Main(string[] args)
{
var tasks = Enumerable.Range(0, 50000).Select(x => RunAsync());
Console.WriteLine("Waiting for tasks to complete...");
await Task.WhenAll(tasks);
Console.WriteLine("Done");
}
}
This queues 50000 work items, and waits until all 50000 tasks are complete. These tasks only execute on as many threads that are needed. Behind the scenes, a task scheduler examines the pool of work and has threads steal work from the queue when they need a task to execute.
Additional Considerations
With a large upper bound (n=50000) you should be cognizant of memory pressure, garbage collector activity, and other task-related overhead. You should consider the following:
Consider using ValueTask<T> to minimize allocations, especially for synchronous operations
Use ConfigureAwait(false) where possible to reduce context switching
Use CancellationTokenSource and CancellationToken to cancel requests early (e.g. timeout)
Follow best practices
Avoid awaiting inside of a loop where possible
Avoid querying tasks too frequently for completion
Avoid accessing Task<T>.Result before a task is complete to prevent blocking
Avoid deadlocks by using synchronization primitives (mutex, semaphore, condition signal, synclock, etc) as appropriate
Avoid frequent use of Task.Run to create tasks to avoid exhausting the thread pool available to the default task scheduler (this method is usually reserved for compute-bound tasks)

Related

Async\await again. Example with network requests

I completely don't understand the applied meaning of async\await.
I just started learning async\await and I know that there are already a huge number of topics. If I understand correctly, then async\await is not needed anywhere else except for operations with a long wait in a thread, if this is not related to a long calculation. For example, database response, network request, file handling. Many people write that async\await is also needed so as not to block the main thread. And here it is completely unclear to me why it should be blocked. Don't block without async\await, just create a task. So I'm trying to create a code that will wait a long time for a response from the network.
I created an example. I see with my own eyes through the windows task manager that the while (i < int.MaxValue) operation is processed first, taking up the entire processor resource, although I first launched the DownloadFile. And only then, when the processor is released, I see that the download files is in progress. On my machine, the example runs ~54 seconds.
Question: how could I first run the DownloadFile asynchronously so that the threads do not idle uselessly, but can do while (i < int.MaxValue)?
using System.Net;
string PathProject = Directory.GetParent(Directory.GetCurrentDirectory()).Parent.Parent.Parent.FullName;
//Create folder 1 in the project folder
DirectoryInfo Path = new DirectoryInfo($"{PathProject}\\1");
int Iterations = Environment.ProcessorCount * 3;
string file = "https://s182vla.storage.yandex.net/rdisk/82b08d86b9920a5e889c6947e4221eb1350374db8d799ee9161395f7195b0b0e/62f75403/geIEA69cusBRNOpxmtup5BdJ7AbRoezTJE9GH4TIzcUe-Cp7uoav-lLks4AknK2SfU_yxi16QmxiuZOGFm-hLQ==?uid=0&filename=004%20-%2002%20Lesnik.mp3&disposition=attachment&hash=e0E3gNC19eqNvFi1rXJjnP1y8SAS38sn5%2ByGEWhnzE5cwAGsEnlbazlMDWSjXpyvq/J6bpmRyOJonT3VoXnDag%3D%3D&limit=0&content_type=audio%2Fmpeg&owner_uid=160716081&fsize=3862987&hid=98984d857027117759bc5ce6092eaa6a&media_type=audio&tknv=v2&rtoken=k9xogU6296eg&force_default=no&ycrid=na-2bc914314062204f1cbf810798018afd-downloader16e&ts=5e61a6daac6c0&s=eef8b08190dc7b22befd6bad89e1393b394869a1668d9b8af3730cce4774e8ad&pb=U2FsdGVkX1__q3AvjJzgzWG4wVR80Oh8XMl-0Dlfyu9FhqAYQVVkoBV0dtBmajpmOkCXKUXPbREOS-MZCxMNu2rkAkKq_n-AXcZ85svtSFs";
List<Task> tasks = new List<Task>();
void MyMethod1(int i)
{
WebClient client = new WebClient();
client.DownloadFile(file, $"{Path}\\{i}.mp3");
}
void MyMethod2()
{
int i = 0;
while (i < int.MaxValue)
{
i++;
}
}
DateTime dateTimeStart = DateTime.Now;
for (int i = 0; i < Iterations; i++)
{
int j = i;
tasks.Add(Task.Run(() => MyMethod1(j)));
}
for (int i = 0; i < Iterations; i++)
{
tasks.Add(Task.Run(() => { MyMethod2(); MyMethod2(); }));
}
Task.WaitAll(tasks.ToArray());
Console.WriteLine(DateTime.Now - dateTimeStart);
while (true)
{
Thread.Sleep(100);
if (Path.GetFiles().Length == Iterations)
{
Thread.Sleep(1000);
foreach (FileInfo f in Path.GetFiles())
{
f.Delete();
}
return;
}
}

If there are 2 web servers that talk to a database and they run on 2 machines with the same spec the web server with async code will be able to handle more concurrent requests.
The following is from 2014's Async Programming : Introduction to Async/Await on ASP.NET
Why Not Increase the Thread Pool Size?
At this point, a question is always asked: Why not just increase the size of the thread pool? The answer is twofold: Asynchronous code scales both further and faster than blocking thread pool threads.
Asynchronous code can scale further than blocking threads because it uses much less memory; every thread pool thread on a modern OS has a 1MB stack, plus an unpageable kernel stack. That doesn’t sound like a lot until you start getting a whole lot of threads on your server. In contrast, the memory overhead for an asynchronous operation is much smaller. So, a request with an asynchronous operation has much less memory pressure than a request with a blocked thread. Asynchronous code allows you to use more of your memory for other things (caching, for example).
Asynchronous code can scale faster than blocking threads because the thread pool has a limited injection rate. As of this writing, the rate is one thread every two seconds. This injection rate limit is a good thing; it avoids constant thread construction and destruction. However, consider what happens when a sudden flood of requests comes in. Synchronous code can easily get bogged down as the requests use up all available threads and the remaining requests have to wait for the thread pool to inject new threads. On the other hand, asynchronous code doesn’t need a limit like this; it’s “always on,” so to speak. Asynchronous code is more responsive to sudden swings in request volume.
(These days threads are added added every 0.5 second)

WebRequest.Create("https://192.168.1.1").GetResponse()
At some point the above code will probably hit the OS method recv(). The OS will suspend your thread until data becomes available. The state of your function, in CPU registers and the thread stack, will be preserved by the OS while the thread is suspended. In the meantime, this thread can't be used for anything else.
If you start that method via Task.Run(), then your method will consume a thread from a thread pool that has been prepared for you by the runtime. Since these threads aren't used for anything else, your program can continue handling other requests on other threads. However, creating a large number of OS threads has significant overheads.
Every OS thread must have some memory reserved for its stack, and the OS must use some memory to store the full state of the CPU for any suspended thread. Switching threads can have a significant performance cost. For maximum performance, you want to keep a small number of threads busy. Rather than having a large number of suspended threads which the OS must keep swapping in and out of each CPU core.
When you use async & await, the C# compiler will transform your method into a coroutine. Ensuring that any state your program needs to remember is no longer stored in CPU registers or on the OS thread stack. Instead all of that state will be stored in heap memory while your task is suspended. When your task is suspended and resumed, only the data which you actually need will be loaded & stored, rather than the entire CPU state.
If you change your code to use .GetResponseAsync(), the runtime will call an OS method that supports overlapped I/O. While your task is suspended, no OS thread will be busy. When data is available, the runtime will continue to execute your task on a thread from the thread pool.
Is this going to impact the program you are writing today? Will you be able to tell the difference? Not until the CPU starts to become the bottleneck. When you are attempting to scale your program to thousands of concurrent requests.
If you are writing new code, look for the Async version of any I/O method. Sprinkle async & await around. It doesn't cost you anything.

If I understand correctly, then async\await is not needed anywhere else except for operations with a long wait in a thread, if this is not related to a long calculation.
It's kind of recursive, but async is best used whenever there's something asynchronous. In other words, anything where the CPU would be wasted if it had to just spin (or block) while waiting for the operation to complete. Operations that are naturally asynchronous are generally I/O-based (as you mention, DB and other network calls, as well as file I/O), but they can be more arbitrary events, too (e.g., timers). Anything where there isn't actual code to run to get the response.
Many people write that async\await is also needed so as not to block the main thread.
At a higher level, there are two primary benefits to async/await, depending on what kind of code you're talking about:
On the server side (e.g., web apps), async/await provides scalability by using fewer threads per request.
On the client side (e.g., UI apps), async/await provides responsiveness by keeping the UI thread free to respond to user input.
Developers tend to emphasize one or the other depending on the kind of work they normally do. So if you see an async article talking about "not blocking the main thread", they're talking about UI apps specifically.
And here it is completely unclear to me why it should be blocked. Don't block without async\await, just create a task.
That works just fine for many situations. But it doesn't work well in others.
E.g., it would be a bad idea to just Task.Run onto a background thread in a web app. The primary benefit of async in a web app is to provide scalability by using fewer threads per request, so using Task.Run does not provide any benefits at all (in fact, scalability is reduced). So, the idea of "use Task.Run instead of async/await" cannot be adopted as a universal principle.
The other problem is in resource-constrained environments, such as mobile devices. You can only have so many threads there before you start running into other problems.
But if you're talking Desktop apps (e.g., WPF and friends), then sure, you can use async/await to free up the UI thread, or you can use Task.Run to free up the UI thread. They both achieve the same goal.
Question: how could I first run the DownloadFile asynchronously so that the threads do not idle uselessly, but can do while (i < int.MaxValue)?
There's nothing in your code that is asynchronous at all. So really, you're dealing with multithreading/parallelism. In general, I recommend using higher-level constructs such as Parallel for parallelism rather than Task.Run.
But regardless of the API used, the underlying problem is that you're kicking off Environment.ProcessorCount * 6 threads. You'll want to ensure that your thread pool is ready for that many threads by calling ThreadPool.SetMinThreads with the workerThreads set to a high enough number.

It's not web requests but here's a toy example:
Test:
n: 1 await: 00:00:00.1373839 sleep: 00:00:00.1195186
n: 10 await: 00:00:00.1290465 sleep: 00:00:00.1086578
n: 100 await: 00:00:00.1101379 sleep: 00:00:00.6517959
n: 300 await: 00:00:00.1207069 sleep: 00:00:02.0564836
n: 500 await: 00:00:00.1211736 sleep: 00:00:02.2742309
n: 1000 await: 00:00:00.1571661 sleep: 00:00:05.3987737
Code:
using System.Diagnostics;
foreach( var n in new []{1, 10, 100, 300, 500, 1000})
{
var sw = Stopwatch.StartNew();
var tasks = Enumerable.Range(0,n)
.Select( i => Task.Run( async () =>
{
await Task.Delay(TimeSpan.FromMilliseconds(100));
}));
await Task.WhenAll(tasks);
var tAwait = sw.Elapsed;
sw = Stopwatch.StartNew();
var tasks2 = Enumerable.Range(0,n)
.Select( i => Task.Run( () =>
{
Thread.Sleep(TimeSpan.FromMilliseconds(100));
}));
await Task.WhenAll(tasks2);
var tSleep = sw.Elapsed;
Console.WriteLine($"n: {n,4} await: {tAwait} sleep: {tSleep}");
}

How is Task.Run limited by CPU cores?

Why is it that the following program will only run a limited number of blocked tasks. The limiting number seems to be the number of cores on the machine.
Initially when I wrote this I expected to see the following:
Job complete output of Jobs 1 - 24
A 2 second gap
Output of Jobs 25 - 48
However the output was:
Job complete output of Jobs 1 - 4
Then randomly completing jobs every couple of 100ms.
When running on server with 32 cores, the program did run as I had expected.
class Program
{
private static object _lock = new object();
static void Main(string[] args)
{
int completeJobs = 1;
var limiter = new MyThreadLimiter();
for (int iii = 1; iii < 100000000; iii++)
{
var jobId = iii;
limiter.Schedule()
.ContinueWith(t =>
{
lock (_lock)
{
completeJobs++;
Console.WriteLine("Job: " + completeJobs + " scheduled");
}
});
}
Console.ReadLine();
}
}
class MyThreadLimiter
{
readonly SemaphoreSlim _semaphore = new SemaphoreSlim(24);
public async Task Schedule()
{
await _semaphore.WaitAsync();
Task.Run(() => Thread.Sleep(2000))
.ContinueWith(t => _semaphore.Release());
}
}
However replacing the Thread.Sleep with Task.Delay gives my expected results.
public async Task Schedule()
{
await _semaphore.WaitAsync();
Task.Delay(2000)
.ContinueWith(t => _semaphore.Release());
}
And using a Thread gives my expected results
public async Task Schedule()
{
await _semaphore.WaitAsync();
var thread = new Thread(() =>
{
Thread.Sleep(2000);
_semaphore.Release();
});
thread.Start();
}
How does Task.Run() work? Is it the case it is limited to the number of cores?

Task.Run schedules the work to run in the thread pool. The thread pool is given wide latitude to schedule the work as best as it can in order to maximize throughput. It will create additional threads when it feels they will be helpful, and remove threads from the pool when it doesn't think it will be able to have enough work for them.
Creating more threads than your processor is able to run at the same time isn't going to be productive when you have CPU bound work. Adding more threads will just result in dramatically more context switches, increasing overhead, and reducing throughput.

Yes for compute bound operations Task.Run() internally uses CLR's thread pool which will throttle the number of new threads to avoid CPU over-subscription. Initially it will run the number of threads that equals to the number of cpu cores concurrently. Then it continually optimises the number of threads using a hill-climbing algorithm based on factors like the number of requests thread pool receives and overall computer resources to either create more threads or fewer threads.
In fact, this is one of the main benefits of using pooled thread over raw thread e.g. (new Thread(() => {}).Start()) as it not only recycles threads but also optimises performance internally for you. As mentioned in the other answer, it's generally a bad idea to block pooled threads because it will "mislead" thread pool's optimisation, simiarly using many pooled thread to do very long-running computation can also lead to thread pool creating more threads and consequently increase the overheads of context switch and later destory extra threads in the pool.

The Task.Run() is running based on CLR Thread pool.
There is a concept called 'OverSubscription', means there
are being more active thread than CPU Cores and they must be time-sliced.
In Thread-Pool when the number of threads that must be scheduled on CPU Cores
increase, Context-Switch rise and at the result, the performance will hurt.
The CLR that manages the Thread-Pool avoid OverSubscription by queuing
and throttling the thread startup and always try to compensate workload.

Measuring performance gains with async-await [duplicate]

I have a Windows Service that reads from multiple MessageQueue instances. Those messagequeues all run their own Task for reading messages. Normally, after reading a message, the work of an I/O database is done. I've found articles claiming it's a good idea to use async on I/O operations, because it would free up threads. I'm trying to simulate the performance boost of using async I/O opertations in a Console application.
The Console application
In my test environment, I have 10 queues. GetQueues() returns 10 different MessageQueue instances.
static void Main(string[] args)
{
var isAsync = Console.ReadLine() == "Y";
foreach (var queue in queueManager.GetQueues())
{
var temp = queue;
Task.Run(() => ReceiveMessagesForQueue(temp, isAsync));
}
while (true)
{
FillAllQueuesWithMessages();
ResetAndStartStopWatch();
while(!AllMessagesRead())
{
Thread.Sleep(10);
}
Console.WriteLine("All messages read in {0}ms", stopWatch.ElapsedMilliseconds);
}
}
static async Task ReceiveMessagesForQueue(MessageQueue queue, bool isAsync)
{
while (true)
{
var message = await Task.Factory.FromAsync<Message>(queue.BeginReceive(), queue.EndReceive);
if (isAsync)
await ProcessMessageAsync(message);
else
ProcessMessage(message);
}
}
Async message processing
Uses await on Task.Delay(), so should release current Thread
static async Task ProcessMessageAsync(Message message)
{
await Task.Delay(1000);
BurnCpu();
}
Sync message processing
waits on Task.Delay(), so shouldn't release current Thread
static void ProcessMessage(Message message)
{
Task.Delay(1000).Wait();
BurnCpu();
}
In the end, results are equal. Am I missing something here?
Edit 1
I'm measuring overall time using stopWatch.ElapsedMilliseconds. I Fill all queues using FillAllQueuesWithMessages() with 10, 100, 10000 or more messages.
Edit 2
ReceiveMessagesForQueue() returns Task instead of void now.
Edit 3 (fix)
This test does show me performance improvement now. I had to make BurnCpu() take more time. While Task.Delay() is being awaited, BurnCPU() can use the released thread to process.

Using async-await doesn't speed up the time it takes to execute a single operation, it just means that you don't have a thread waiting doing nothing.
In your case Task.Delay will take a second no matter what but here:
Task.Delay(1000).Wait();
You have a thread that sits and waits for the second to end while here:
await Task.Delay(1000);
You don't. You are still asynchronously waiting (hence, await) but no thread is being used which means better scalability.
In async-await you get the performance boost because your app can do the same with less threads, or do more with the same threads. To measure that you need to have a lot of async operations concurrently. Only then will you notice that the async option utilizes CPU resources better than the synchronous one.
More info about freeing threads here There Is No Thread

You're still running each task in its own thread from the thread pool - as you're using the default task scheduler. If you want to see performance imporvement, you'll need to make sure several tasks are performed on the same thread.
Also, with 20 parallel tasks, you're probably not going to see any difference. Try it with 2,000 tasks.

Need a queue of jobs to be processed by threads

I have some work (a job) that is in a queue (so there a several of them) and I want each job to be processed by a thread.
I was looking at Rx but this is not what I wanted and then came across the parallel task library.
Since my work will be done in an web application I do not want client to be waiting for each job to be finished, so I have done the following:
public void FromWebClientRequest(int[] ids);
{
// I will get the objects for the ids from a repository using a container (UNITY)
ThreadPool.QueueUserWorkItem(delegate
{
DoSomeWorkInParallel(ids, container);
});
}
private static void DoSomeWorkInParallel(int[] ids, container)
{
Parallel.ForEach(ids, id=>
{
Some work will be done here...
var respository = container.Resolve...
});
// Here all the work will be done.
container.Resolve<ILogger>().Log("finished all work");
}
I would call the above code on a web request and then the client will not have to wait.
Is this the correct way to do this?
TIA

From the MSDN docs I see that Unitys IContainer Resolve method is not thread safe (or it is not written). This would mean that you need to do that out of the thread loop. Edit: changed to Task.
public void FromWebClientRequest(int[] ids);
{
IRepoType repoType = container.Resolve<IRepoType>();
ILogger logger = container.Resolve<ILogger>();
// remove LongRunning if your operations are not blocking (Ie. read file or download file long running queries etc)
// prefer fairness is here to try to complete first the requests that came first, so client are more likely to be able to be served "first come, first served" in case of high CPU use with lot of requests
Task.Factory.StartNew(() => DoSomeWorkInParallel(ids, repoType, logger), TaskCreationOptions.LongRunning | TaskCreationOptions.PreferFairness);
}
private static void DoSomeWorkInParallel(int[] ids, IRepoType repository, ILogger logger)
{
// if there are blocking operations inside this loop you ought to convert it to tasks with LongRunning
// why this? to force more threads as usually would be used to run the loop, and try to saturate cpu use, which would be doing nothing most of the time
// beware of doing this if you work on a non clustered database, since you can saturate it and have a bottleneck there, you should try and see how it handles your workload
Parallel.ForEach(ids, id=>{
// Some work will be done here...
// use repository
});
logger.Log("finished all work");
}
Plus as fiver stated, if you have .Net 4 then Tasks is the way to go.
Why go Task (question in comment):
If your method fromClientRequest would be fired insanely often, you would fill the thread pool, and overall system performance would probably not be as good as with .Net 4 with fine graining. This is where Task enters the game. Each task is not its own thread but the new .Net 4 thread pool creates enough threads to maximize performance on a system, and you do not need to bother on how many cpus and how much thread context switches would there be.
Some MSDN quotes for ThreadPool:
When all thread pool threads have been
assigned to tasks, the thread pool
does not immediately begin creating
new idle threads. To avoid
unnecessarily allocating stack space
for threads, it creates new idle
threads at intervals. The interval is
currently half a second, although it
could change in future versions of the
.NET Framework.
The thread pool has a default size of
250 worker threads per available
processor
Unnecessarily increasing the number of
idle threads can also cause
performance problems. Stack space must
be allocated for each thread. If too
many tasks start at the same time, all
of them might appear to be slow.
Finding the right balance is a
performance-tuning issue.
By using Tasks you discard those issues.
Another good thing is you can fine grain the type of operation to run. This is important if your tasks do run blocking operations. This is a case where more threads are to be allocated concurrently since they would mostly wait. ThreadPool cannot achieve this automagically:
Task.Factory.StartNew(() => DoSomeWork(), TaskCreationOptions.LongRunning);
And of course you are able to make it finish on demand without resorting to ManualResetEvent:
var task = Task.Factory.StartNew(() => DoSomeWork());
task.Wait();
Beside this you don't have to change the Parallel.ForEach if you don't expect exceptions or blocking, since it is part of the .Net 4 Task Parallel Library, and (often) works well and optimized on the .Net 4 pool as Tasks do.
However if you do go to Tasks instead of parallel for, remove the LongRunning from the caller Task, since Parallel.For is a blocking operations and Starting tasks (with the fiver loop) is not. But this way you loose the kinda first-come-first-served optimization, or you have to do it on a lot more Tasks (all spawned through ids) which probably would give less correct behaviour. Another option is to wait on all tasks at the end of DoSomeWorkInParallel.

Another way is to use Tasks:
public static void FromWebClientRequest(int[] ids)
{
foreach (var id in ids)
{
Task.Factory.StartNew(i =>
{
Wl(i);
}
, id);
}
}

I would call the above code on a web
request and then the client will not
have to wait.
This will work provided the client does not need an answer (like Ok/Fail).
Is this the correct
way to do this?
Almost. You use Parallel.ForEach (TPL) for the jobs but run it from a 'plain' Threadpool job. Better to use a Task for the outer job as well.
Also, handle all exceptions in that outer Task. And be careful about the thread-safety of the container etc.

In .NET is there a thread scheduler for long running threads?

Our scenario is a network scanner.
It connects to a set of hosts and scans them in parallel for a while using low priority background threads.
I want to be able to schedule lots of work but only have any given say ten or whatever number of hosts scanned in parallel. Even if I create my own threads, the many callbacks and other asynchronous goodness uses the ThreadPool and I end up running out of resources. I should look at MonoTorrent...
If I use THE ThreadPool, can I limit my application to some number that will leave enough for the rest of the application to Run smoothly?
Is there a threadpool that I can initialize to n long lived threads?
[Edit]
No one seems to have noticed that I made some comments on some responses so I will add a couple things here.
Threads should be cancellable both
gracefully and forcefully.
Threads should have low priority leaving the GUI responsive.
Threads are long running but in Order(minutes) and not Order(days).
Work for a given target host is basically:
For each test
Probe target (work is done mostly on the target end of an SSH connection)
Compare probe result to expected result (work is done on engine machine)
Prepare results for host
Can someone explain why using SmartThreadPool is marked wit ha negative usefulness?

In .NET 4 you have the integrated Task Parallel Library. When you create a new Task (the new thread abstraction) you can specify a Task to be long running. We have made good experiences with that (long being days rather than minutes or hours).
You can use it in .NET 2 as well but there it's actually an extension, check here.
In VS2010 the Debugging Parallel applications based on Tasks (not threads) has been radically improved. It's advised to use Tasks whenever possible rather than raw threads. Since it lets you handle parallelism in a more object oriented friendly way.
UPDATE
Tasks that are NOT specified as long running, are queued into the thread pool (or any other scheduler for that matter).
But if a task is specified to be long running, it just creates a standalone Thread, no thread pool is involved.

The CLR ThreadPool isn't appropriate for executing long-running tasks: it's for performing short tasks where the cost of creating a thread would be nearly as high as executing the method itself. (Or at least a significant percentage of the time it takes to execute the method.) As you've seen, .NET itself consumes thread pool threads, you can't reserve a block of them for yourself lest you risk starving the runtime.
Scheduling, throttling, and cancelling work is a different matter. There's no other built-in .NET worker-queue thread pool, so you'll have roll your own (managing the threads or BackgroundWorkers yourself) or find a preexisting one (Ami Bar's SmartThreadPool looks promising, though I haven't used it myself).

In your particular case, the best option would not be either threads or the thread pool or Background worker, but the async programming model (BeginXXX, EndXXX) provided by the framework.
The advantages of using the asynchronous model is that the TcpIp stack uses callbacks whenever there is data to read and the callback is automatically run on a thread from the thread pool.
Using the asynchronous model, you can control the number of requests per time interval initiated and also if you want you can initiate all the requests from a lower priority thread while processing the requests on a normal priority thread which means the packets will stay as little as possible in the internal Tcp Queue of the networking stack.
Asynchronous Client Socket Example - MSDN
P.S. For multiple concurrent and long running jobs that don't do allot of computation but mostly wait on IO (network, disk, etc) the better option always is to use a callback mechanism and not threads.

I'd create your own thread manager. In the following simple example a Queue is used to hold waiting threads and a Dictionary is used to hold active threads, keyed by ManagedThreadId. When a thread finishes, it removes itself from the active dictionary and launches another thread via a callback.
You can change the max running thread limit from your UI, and you can pass extra info to the ThreadDone callback for monitoring performance, etc. If a thread fails for say, a network timeout, you can reinsert back into the queue. Add extra control methods to Supervisor for pausing, stopping, etc.
using System;
using System.Collections.Generic;
using System.Threading;
namespace ConsoleApplication1
{
public delegate void CallbackDelegate(int idArg);
class Program
{
static void Main(string[] args)
{
new Supervisor().Run();
Console.WriteLine("Done");
Console.ReadKey();
}
}
class Supervisor
{
Queue<System.Threading.Thread> waitingThreads = new Queue<System.Threading.Thread>();
Dictionary<int, System.Threading.Thread> activeThreads = new Dictionary<int, System.Threading.Thread>();
int maxRunningThreads = 10;
object locker = new object();
volatile bool done;
public void Run()
{
// queue up some threads
for (int i = 0; i < 50; i++)
{
Thread newThread = new Thread(new Worker(ThreadDone).DoWork);
newThread.IsBackground = true;
waitingThreads.Enqueue(newThread);
}
LaunchWaitingThreads();
while (!done) Thread.Sleep(200);
}
// keep starting waiting threads until we max out
void LaunchWaitingThreads()
{
lock (locker)
{
while ((activeThreads.Count < maxRunningThreads) && (waitingThreads.Count > 0))
{
Thread nextThread = waitingThreads.Dequeue();
activeThreads.Add(nextThread.ManagedThreadId, nextThread);
nextThread.Start();
Console.WriteLine("Thread " + nextThread.ManagedThreadId.ToString() + " launched");
}
done = (activeThreads.Count == 0) && (waitingThreads.Count == 0);
}
}
// this is called by each thread when it's done
void ThreadDone(int threadIdArg)
{
lock (locker)
{
// remove thread from active pool
activeThreads.Remove(threadIdArg);
}
Console.WriteLine("Thread " + threadIdArg.ToString() + " finished");
LaunchWaitingThreads(); // this could instead be put in the wait loop at the end of Run()
}
}
class Worker
{
CallbackDelegate callback;
public Worker(CallbackDelegate callbackArg)
{
callback = callbackArg;
}
public void DoWork()
{
System.Threading.Thread.Sleep(new Random().Next(100, 1000));
callback(System.Threading.Thread.CurrentThread.ManagedThreadId);
}
}
}

Use the built-in threadpool. It has good capabilities.
Alternatively you can look at the Smart Thread Pool implementation here or at Extended Thread Pool for a limit on the maximum number of working threads.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.