Making the most of the .NET Task Parallel Library

Making the most of the .NET Task Parallel Library - c#

Question 1.
Is using Parallel.For and Parallel.ForEach better suited to working with tasks that are ordered or unordered?
My reason for asking is that I recently updated a serial loop where a StringBuilder was being used to generate a SQL statement based on various parameters. The result was that the SQL was a bit jumbled up (to the point it contained syntax errors) in comparison to when using a standard foreach loop, therefore my gut feeling is that TPL is not suited to performing tasks where the data must appear in a particular order.
Question 2.
Does the TPL automatically make use of multicore architectures of must I provision anything prior to execution?
My reason for asking this relates back to an eariler question I asked relating to performance profiling of TPL operations. An answer to the question enlightened me to the fact that TPL is not always more efficient than a standard serial loop as the application may not have access to multiple cores, and therefore the overhead of creating additional threads and loops would create a performance decrease in comparison to a standard serial loop.

my gut feeling is that TPL is not suited to performing tasks where the data must appear in a particular order.
Correct. If you expect things in order, you might have a misunderstanding about what's going to happen when you "parallelize" a loop.
Does the TPL automatically make use of multicore architectures of must I provision anything prior to execution?
See the following article on the msdn magazine:
http://msdn.microsoft.com/en-us/magazine/cc163340.aspx
Using the library, you can conveniently express potential parallelism
in existing sequential code, where the exposed parallel tasks will be
run concurrently on all available processors.

If the results must be ordered then for you to parallelize the loop you need to be able to do the actual work in any order and then sort the results. This may or may not be more efficient than doing the work serially in the first place, depending on the situation. If the benefit of parallelizing the work that can be done in any order is more than the cost of ordering the results then it's a net gain. If that task just isn't sufficiently complex, your hardware doesn't allow for a lot of parallelization, or if it doesn't parallelize well (i.e. you have a lot of waiting due to data dependencies) then sorting the results could take more time than you gain from parallelizing the loop (or worse yet, parallelizing the loop takes longer even without the sort, see question two) and so you shouldn't parallelize it.
Note that if the actual units of work need to be run in a certain order, rather than just needing the results in a certain order, then you either won't be able to parallelize it, or you won't be able to parallelize it nearly as effectively. If you don't properly synchronize access to share resources then you'll actually end up with the wrong result (as happened in your case). To that end you need to remember that performance optimizations are meaningless if you can't actually come up with the correct result.
You don't really need to worry much about your hardware with the TPL. You don't need to explicitly add or restrict tasks. While there are a few ways that you can, virtually anytime you do something like this you'll hurt performance. When you do stuff like that you're adding restrictions to the TPL so it can't do what it wants. Often it knows better than you.
You also touch on another point here, and that's that parallelizing a loop will often take longer than not (you just didn't give likely reasons to cause this behavior). Often the actual work that needs to be done is just very small, so small that the work of creating threads, managing them, dealing with context shifts and synchronizing data as needed can be more work than what you gain from doing some work in parallel. This is why it's important to actually do lots of testing when deciding to parallelize some work to ensure that it actually benefits from it.

It's not better or worse for unordered lists - your issue in #1 was that you had a shared dependency on a StringBuilder which is why the parallel query failed. TPL works wonderfully on independent units of work. Even then, there are simple tricks that you can use to force an evaluation of a parallel query and keep the original ordering of the results when the parallel operations have all finished.
TPL and PLINQ are technically distinct things; PLINQ uses TPL to accomplish it's goal. That said, PLINQ tries to inspect your architecture and structure the execution of the set as best it can. TPL is simply the wrapper around the task architecture. It's going to up to you to determine if the overhead of creating a task (which is something like 1MB of memory), and overhead of the context switching to execute the tasks is greater than simply running the task serially.

On point 1, if using TPL you don't know in which order which task is run. That's the beauty of parallel vs sequential. There are means to control the order of things but then you probably lose the benefits of parallel.
On 2: TPL makes use of multi-cores out of the box. But there is indeed always overhead in using multiple threads. Load on the scheduler increases, thread (context) switching is not for free. To keep data in sync to avoid race-conditions you'll probably need some locking mechanism that also adds to the overhead.
Crafting a fast parallel algorithm using TPL is made a lot easier but still some sort of art.

Clearly TPL is not a good tool for building an ordered set like a query.
If you have a series of tasks to perform on set of items then you can can use a BlockingCollection. The tasks can be performed in parallel but the order of the set is maintained.
BlockingCollection Class

Related

best use of Parallel.ForEach / Multithreading

I need to scrape data from a website.
I have over 1,000 links I need to access, and previously I was dividing the links 10 per thread, and would start 100 threads each pulling 10. After few test cases, 100 threads was the best count to minimize the time it retrieved the content for all the links.
I realized that .NET 4.0 offered better support for multi-threading out of the box, but this is done based on how many cores you have, which in my case does not spawn enough threads. I guess what I am asking is: what is the best way to optimize the 1,000 link pulling. Should I be using .ForEach and let the Parallel extension control the amount threads that get spawned, or find a way to tell it how many threads to start and divide the work?
I have not worked with Parallel before so maybe my approach maybe wrong.

you can use MaxDegreeOfParallelism property in Parallel.ForEach to control the number of threads that will be spawned.
Heres the code snippet -
ParallelOptions opt = new ParallelOptions();
opt.MaxDegreeOfParallelism = 5;
Parallel.ForEach(Directory.GetDirectories(Constants.RootFolder), opt, MyMethod);

In general, Parallel.ForEach() is quite good at optimizing the number of threads. It accounts for the number of cores in the system, but also takes into account what the threads are doing (CPU bound, IO bound, how long the method runs, etc.).
You can control the maximum degree of parallelization, but there's no mechanism to force more threads to be used.
Make sure your benchmarks are correct and can be compared in a fair manner (e.g. same websites, allow for a warm-up period before you start measuring, and do many runs since response time variance can be quite high scraping websites). If after careful measurement your own threading code is still faster, you can conclude that you have optimized for your particular case better than .NET and stick with your own code.

Something worth checking out is the TPL Dataflow library.
DataFlow on MSDN.
See Nesting await in Parallel.ForEach
The whole idea behind Parallel.ForEach() is that you have a set of threads and each processes part of the collection. As you noticed, this doesn't work with async-await, where you want to release the thread for the duration of the async call.
Also, the walkthrough Creating a Dataflow Pipeline specifically sets up and processes multiple web page downloads. TPL Dataflow really was designed for that scenario.

Hard to say without looking at your code and how the collection is defined, I've found that Parallel.Invoke is the most flexible. try msdn? ... sounds like you are looking to use Parallel.For Method (Int32, Int32, Action<Int32, ParallelLoopState>)

When to use Multithread?

When do you use threads in a application? For example, in simple CRUD operations, use of smtp, calling webservices that may take a few time if the server is facing bandwith issues, etc.
To be honest, i don't know how to determine if i need to use a thread (i know that it must be when we're excepting that a operation will take a few time to be done).
This may be a "noob" question but it'll be great if you share with me your experience in threads.
Thanks

I added C# and .NET tags to your question because you mention C# in your title. If that is not accurate, feel free to remove the tags.
There are different styles of multithreading. For example, there are asynchronous operations with callback functions. .NET 4 introduces the parallel Linq library. The style of multithreading you would use, or whether to use any at all, depends on what you are trying to accomplish.
Parallel execution, such as what parallel Linq would generally be trying to do, takes advantage of multiple processor cores executing instructions that do not need to wait for data from each other. There are many sources for such algorithms outside Linq, such as this. However, it is possible that parallel execution may be unable to you or that it does not suit your application.
More traditional multithreading takes advantage of threading within the .NET library (in this case) as provided by System.Thread. Remember that there is some overhead in starting processes on threads, so only use threads when the advantages of doing so outweigh this overhead. Generally speaking, you would only want to use this type of single-processor multithreading when the task running under the thread will have long gaps in which the processor could be doing something else. For example, I/O from hard disk (and, consequently, from a database system that uses one) is many orders of magnitude slower than memory access. Network access can also be slow, as another example. Multithreading could allow another process to be running while waiting for these slow (compared to the processor) operations to complete.
Another example when I have used traditional multithreading is to cache some values the first time a particular ASP.NET page is accessed within a session. I kick off a thread so that the user does not have to wait for the caching to complete before interacting with the page. I also regulate the behavior when the caching does not complete before the user requests another page so that, if the caching does not complete, it is not a problem. It simply makes some further requests faster that were previously too slow.
Consider also the cost that multithreading has to the maintainability of your application. Threaded applications can be harder to debug, for example.
I hope this answers your question at least somewhat.

Joseph Albahari summarized it very well here:
Maintaining a responsive user interface
Making efficient use of an otherwise blocked CPU
Parallel programming
Speculative execution
Allowing requests to be processed simultaneously

One reason to use threads is to split large, CPU-bound tasks across a number of CPUs/cores, to finish faster. Another is to let an extended task execute asynchronously, so the foreground can remain responsive while it runs.
Your examples seem to be concentrating on the second of these. While it can be a good reason, if you can use asynchronous I/O instead, that's usually preferable (e.g., almost anything using sockets can/will be better off using the socket(s) asynchronously). Asynchronous I/O is easier to cancel, and it'll usually have lower CPU overhead as well.

You can use threads when you need different execution paths. This leads(when done correctly) to more responsive and/or faster applications but also leads to more complex code and debugging.
In a simple CRUD scenario maybe is not that useful, but maybe your UI is consuming a slow web service. If you your code is tied to your UI thread you will have unresponsive UI between the service calls.
In that case, using System.Threading.Threads maybe be overkill because you don't need so much control. Using a BackgrounWorker maybe a better choice.
Threading is something difficult to master, but the benefits when used correctly are huge, performance is the most common.

Somehow you have answered your question by yourself. Using threads whenever you execute time consuming operations is right choice. Also you should it in situations when you want to make things faster. For example you want to process some amount of files - each file can be processed by different thread.
By using threads you can better utilize power of multi-core/processor machines.
Monitoring some data in background of your application.
There are dozens of such scenarios.

Realising my comment might suffice as an answer ...
I like to view multi-threading scenarios from a resource perspective. In other words, UI (graphics), networking, disk IO, CPU (cores), RAM etc. I find that helps when deciding where to use multi-threading in the general sense at least.
The reasoning behind this is simply that I can take advantage of one resource on a specific thread (eg. Disk IO) while at the same time using another thread to accomplish something else using a different resource.

Multithreading improvements in .NET 4

I have heard that the .NET 4 team has added new classes in the framework that make working with threads better and easier.
Basically the question is what are the new ways to run multithreaded tasks added in .NET 4 and what are they designed to be used for?
UPD: Just to make it clear, I'm not looking for a single way of running parallel tasks in .NET 4, I want to find out which are the new ones added, and if possible what situation would each of them be best suited for..

With the lack of responses, I decided to evaluate on the answers below with that I've learned..
As #Scott stated, .NET 4 added the Task Parallel Library which adds a number of innovations, new methods and approaches to parallelism.
One of the first things to mention is the Parallel.For and Parallel.ForEach methods, which allow the developer to process multiple items in multiple threads. The Framework in this case will decide how many threads are necessary, and when to create new threads, and when not to.
This is a very simple and straightforward way to parallelize existing code, and add some performance boost.
Another way, somewhat similar to the previous approaches is using the PLINQ extenders. They take an existing enumeration, and extend it with parallel linq extenders. So if you have an existing linq query, you can easily convert it to PLINQ. What this means is all the operations on the PLINQ enumerable will also take advantage of multiple threads, and filtering your list of objects using a .Where clause, for example, will run in multiple threads now!
One of the bigger innovations in the TPL is the new Task class. In some ways it may look like the already well known Thread class, but it takes advantage of the new Thread Pool in .NET 4 (which has been improved a lot compared on previous versions), and is much more functional than the regular Thread class. For example you can chain Tasks where tasks in the middle of the chain will only start when the previous ones finish. Examples and in-depth explanation in a screencast on Channel 9
To enhance the work with Task classes, we can use the BlockingCollection<>. This works perfectly in situations where you have a producer-consumer scenario. You can have multiple threads producing some objects, that will then be consumed and processed by consumer methods. This can be easily parallelised and controlled with the Task factory and the blocking collection. Useful screencast with examples from Channel 9
These can also use different backing storage classes (ConcurrentQueue, ConcurentStack, ConcurrentBag), which are all thread safe, and are different in terms of element ordering and performance. Examples and explanations of them in a different video here
One more new thing that has been added (which probably isn't part of the TPL, but helps us here anyway) is the CountdownEvent class, which can help us in "task coordination scenarios" (c). Basically allows us to wait until all parallel tasks are finished. Screencast with example usage on Channel 9
You can see a number of screencasts and videos on Channel 9 that are tagged with "Parallel Computing"

Yes, .NET 4 added the Task Parallel Library which, at a high level, adds support for:
running parallel loops with Parallel.For and Parallel.ForEach
create or run tasks using Parallel.Invoke or the Task class
PLINQ (parallel LINQ to Objects)
Answering the update to the original question...
The TPL is the preferred way of writing parallel tasks using .NET 4. You can still create threadpool items yourself, and do all of the same "manual" threading techniques you could before. The thing to keep in mind is that the entire threadpool (and pretty much everything threading related) has been rewritten to take advantage of the TPL. This means that even if you create a threadpool item yourself you still end up using the TPL, even if you don't know it. The other thing to keep in mind is that the TPL is much more optimized, and will scale more appropriately based on the number of available processors.
As for knowing what situation each of them would be best suited for, there is no "silver bullet" answer. If you were previously queueing your own threadpool item (or otherwise doing something multi-threaded) you can modify that portion of your code to use the TPL without any consequences.
For things like parallel loops or parallel queries, you will need to analyze the code and the execution of that code to determine if it is appropriate to be parallelized.

Strictly speaking this is C# 4.0 and not a new class, but events now have a smarter form of locking which if I've understood the change correctly, removes the need for reems of locking code as shown below (taken from this article by Jon Skeet):
SomeEventHandler someEvent;
readonly object someEventLock = new object();
public event SomeEventHandler SomeEvent
{
add
{
lock (someEventLock)
{
someEvent += value;
}
}
remove
{
lock (someEventLock)
{
someEvent -= value;
}
}
}

How do I pick the best number of threads for hyptherthreading/multicore?

I have some embarrassingly-parallelizable work in a .NET 3.5 console app and I want to take advantage of hyperthreading and multi-core processors. How do I pick the best number of worker threads to utilize either of these the best on an arbitrary system? For example, if it's a dual core I will want 2 threads; quad core I will want 4 threads. What I'm ultimately after is determining the processor characteristics so I can know how many threads to create.
I'm not asking how to split up the work nor how to do threading, I'm asking how do I determine the "optimal" number of the threads on an arbitrary machine this console app will run on.

I'd suggest that you don't try to determine it yourself. Use the ThreadPool and let .NET manage the threads for you.

You can use Environment.ProcessorCount if that's the only thing you're after. But usually using a ThreadPool is indeed the better option.
The .NET thread pool also has provisions for sometimes allocating more threads than you have cores to maximise throughput in certain scenarios where many threads are waiting for I/O to finish.

The correct number is obviously 42.
Now on the serious note. Just use the thread pool, always.
1) If you have a lengthy processing task (ie. CPU intensive) that can be partitioned into multiple work piece meals then you should partition your task and then submit all individual work items to the ThreadPool. The thread pool will pick up work items and start churning on them in a dynamic fashion as it has self monitoring capabilities that include starting new threads as needed and can be configured at deployment by administrators according to the deployment site requirements, as opposed to pre-compute the numbers at development time. While is true that the proper partitioning size of your processing task can take into account the number of CPUs available, the right answer depends so much on the nature of the task and the data that is not even worth talking about at this stage (and besides the primary concerns should be your NUMA nodes, memory locality and interlocked cache contention, and only after that the number of cores).
2) If you're doing I/O (including DB calls) then you should use Asynchronous I/O and complete the calls in ThreadPool called completion routines.
These two are the the only valid reasons why you should have multiple threads, and they're both best handled by using the ThreadPool. Anything else, including starting a thread per 'request' or 'connection' are in fact anti patterns on the Win32 API world (fork is a valid pattern in *nix, but definitely not on Windows).
For a more specialized and way, way more detailed discussion of the topic I can only recommend the Rick Vicik papers on the subject:
designing-applications-for-high-performance-part-1.aspx
designing-applications-for-high-performance-part-ii.aspx
designing-applications-for-high-performance-part-iii.aspx

The optimal number would just be the processor count. Optimally you would always have one thread running on a CPU (logical or physical) to minimise context switches and the overhead that has with it.
Whether that is the right number depends (very much as everyone has said) on what you are doing. The threadpool (if I understand it correctly) pretty much tries to use as few threads as possible but spins up another one each time a thread blocks.
The blocking is never optimal but if you are doing any form of blocking then the answer would change dramatically.
The simplest and easiest way to get good (not necessarily optimal) behaviour is to use the threadpool. In my opinion its really hard to do any better than the threadpool so thats simply the best place to start and only ever think about something else if you can demonstrate why that is not good enough.

A good rule of the thumb, given that you're completely CPU-bound, is processorCount+1.
That's +1 because you will always get some tasks started/stopped/interrupted and n tasks will almost never completely fill up n processors.

The only way is a combination of data and code analysis based on performance data.
Different CPU families and speeds vs. memory speed vs other activities on the system are all going to make the tuning different.
Potentially some self-tuning is possible, but this will mean having some form of live performance tuning and self adjustment.

Or even better than the ThreadPool, use .NET 4.0 Task instances from the TPL. The Task Parallel Library is built on a foundation in the .NET 4.0 framework that will actually determine the optimal number of threads to perform the tasks as efficiently as possible for you.

I read something on this recently (see the accepted answer to this question for example).
The simple answer is that you let the operating system decide. It can do a far better job of deciding what's optimal than you can.
There are a number of questions on a similar theme - search for "optimal number threads" (without the quotes) gives you a couple of pages of results.

I would say it also depends on what you are doing, if your making a server application then using all you can out of the CPU`s via either Environment.ProcessorCount or a thread pool is a good idea.
But if this is running on a desktop or a machine that not dedicated to this task, you might want to leave some CPU idle so the machine "functions" for the user.

It can be argued that the real way to pick the best number of threads is for the application to profile itself and adaptively change its threading behavior based on what gives the best performance.

I wrote a simple number crunching app that used multiple threads, and found that on my Quad-core system, it completed the most work in a fixed period using 6 threads.
I think the only real way to determine is through trialling or profiling.

In addition to processor count, you may want to take into account the process's processor affinity by counting bits in the affinity mask returned by the GetProcessAffinityMask function.

If there is no excessive i/o processing or system calls when the threads are running, then the number of thread (except the main thread) is in general equal to the number of processors/cores in your system, otherwise you can try to increase the number of threads by testing.

Alternative to Threads

I've read that threads are very problematic. What alternatives are available? Something that handles blocking and stuff automatically?
A lot of people recommend the background worker, but I've no idea why.
Anyone care to explain "easy" alternatives? The user will be able to select the number of threads to use (depending on their speed needs and computer power).
Any ideas?

To summarize the problems with threads:
if threads share memory, you can get
race conditions
if you avoid races by liberally using locks, you
can get deadlocks (see the dining philosophers problem)
An example of a race: suppose two threads share access to some memory where a number is stored. Thread 1 reads from the memory address and stores it in a CPU register. Thread 2 does the same. Now thread 1 increments the number and writes it back to memory. Thread 2 then does the same. End result: the number was only incremented by 1, while both threads tried to increment it. The outcome of such interactions depend on timing. Worse, your code may seem to work bug-free but once in a blue moon the timing is wrong and bad things happen.
To avoid these problems, the answer is simple: avoid sharing writable memory. Instead, use message passing to communicate between threads. An extreme example is to put the threads in separate processes and communicate via TCP/IP connections or named pipes.
Another approach is to share only read-only data structures, which is why functional programming languages can work so well with multiple threads.

This is a bit higher-level answer, but it may be useful if you want to consider other alternatives to threads. Anyway, most of the answers discussed solutions based on threads (or thread pools) or maybe tasks from .NET 4.0, but there is one more alternative, which is called message-passing. This has been successfuly used in Erlang (a functional language used by Ericsson). Since functional programming is becoming more mainstream in these days (e.g. F#), I thought I could mention it. In genral:
Threads (or thread pools) can usually used when you have some relatively long-running computation. When it needs to share state with other threads, it gets tricky (you have to correctly use locks or other synchronization primitives).
Tasks (available in TPL in .NET 4.0) are very lightweight - you can split your program into thousands of tasks and then let the runtime run them (it will use optimal number of threads). If you can write your algorithm using tasks instead of threads, it sounds like a good idea - you can avoid some synchronization when you run computation using smaller steps.
Declarative approaches (PLINQ in .NET 4.0 is a great option) if you have some higher-level data processing operation that can be encoded using LINQ primitives, then you can use this technique. The runtime will automatically parallelize your code, because LINQ doesn't specify how exactly should it evaluate the results (you just say what results you want to get).
Message-passing allows you two write program as concurrently running processes that perform some (relatively simple) tasks and communicate by sending messages to each other. This is great, because you can share some state (send messages) without the usual synchronization issues (you just send a message, then do other thing or wait for messages). Here is a good introduction to message-passing in F# from Robert Pickering.
Note that the last three techniques are quite related to functional programming - in functional programming, you desing programs differently - as computations that return result (which makes it easier to use Tasks). You also often write declarative and higher-level code (which makes it easier to use Declarative approaches).
When it comes to actual implementation, F# has a wonderful message-passing library right in the core libraries. In C#, you can use Concurrency & Coordination Runtime, which feels a bit "hacky", but is probably quite powerful too (but may look too complicated).

Won't the parallel programming options in .Net 4 be an "easy" way to use threads? I'm not sure what I'd suggest for .Net 3.5 and earlier...
This MSDN link to the Parallel Computing Developer Center has links to lots of info on Parellel Programming including links to videos, etc.

I can recommend this project. Smart Thread Pool
Project Description
Smart Thread Pool is a thread pool written in C#. It is far more advanced than the .NET built-in thread pool.
Here is a list of the thread pool features:
The number of threads dynamically changes according to the workload on the threads in the pool.
Work items can return a value.
A work item can be cancelled.
The caller thread's context is used when the work item is executed (limited).
Usage of minimum number of Win32 event handles, so the handle count of the application won't explode.
The caller can wait for multiple or all the work items to complete.
Work item can have a PostExecute callback, which is called as soon the work item is completed.
The state object, that accompanies the work item, can be disposed automatically.
Work item exceptions are sent back to the caller.
Work items have priority.
Work items group.
The caller can suspend the start of a thread pool and work items group.
Threads have priority.
Can run COM objects that have single threaded apartment.
Support Action and Func delegates.
Support for WindowsCE (limited)
The MaxThreads and MinThreads can be changed at run time.
Cancel behavior is imporved.

"Problematic" is not the word I would use to describe working with threads. "Tedious" is a more appropriate description.
If you are new to threaded programming, I would suggest reading this thread as a starting point. It is by no means exhaustive but has some good introductory information. From there, I would continue to scour this website and other programming sites for information related to specific threading questions you may have.
As for specific threading options in C#, here's some suggestions on when to use each one.
Use BackgroundWorker if you have a single task that runs in the background and needs to interact with the UI. The task of marshalling data and method calls to the UI thread are handled automatically through its event-based model. Avoid BackgroundWorker if (1) your assembly does not already reference the System.Windows.Form assembly, (2) you need the thread to be a foreground thread, or (3) you need to manipulate the thread priority.
Use a ThreadPool thread when efficiency is desired. The ThreadPool helps avoid the overhead associated with creating, starting, and stopping threads. Avoid using the ThreadPool if (1) the task runs for the lifetime of your application, (2) you need the thread to be a foreground thread, (3) you need to manipulate the thread priority, or (4) you need the thread to have a fixed identity (aborting, suspending, discovering).
Use the Thread class for long-running tasks and when you require features offered by a formal threading model, e.g., choosing between foreground and background threads, tweaking the thread priority, fine-grained control over thread execution, etc.

Any time you introduce multiple threads, each running at once, you open up the potential for race conditions. To avoid these, you tend to need to add synchronization, which adds complexity, as well as the potential for deadlocks.
Many tools make this easier. .NET has quite a few classes specifically meant to ease the pain of dealing with multiple threads, including the BackgroundWorker class, which makes running background work and interacting with a user interface much simpler.
.NET 4 is going to do a lot to ease this even more. The Task Parallel Library and PLINQ dramatically ease working with multiple threads.
As for your last comment:
The user will be able to select the number of threads to use (depending on their speed needs and computer power).
Most of the routines in .NET are built upon the ThreadPool. In .NET 4, when using the TPL, the work load will actually scale at runtime, for you, eliminating the burden of having to specify the number of threads to use. However, there are ways to do this now.
Currently, you can use ThreadPool.SetMaxThreads to help limit the number of threads generated. In TPL, you can specify ParallelOptions.MaxDegreesOfParallelism, and pass an instance of the ParallelOptions into your routine to control this. The default behavior scales up with more threads as you add more processing cores, which is usually the best behavior in any case.

Threads are not problematic if you understand what causes problems with them.
For ex. if you avoid statics, you know which API's to use (e.g. use synchronized streams), you will avoid many of the issues that come up for their bad utilization.

If threading is a problem (this can happen if you have unsafe/unmanaged 3rd party dll's that cannot support multithreading. In this can an option is to create a meachism to queue the operations. ie store the parameters of the action to a database and just run through them one at a time. This can be done in a windows service. Obviously this will take longer but in some cases is the only option.

Threads are indispensable tools for solving many problems, and it behooves the maturing developer to know how to effectively use them. But like many tools, they can cause some very difficult-to-find bugs.
Don't shy away from some so useful just because it can cause problems, instead study and practice until you become the go-to guy for multi-threaded apps.
A great place to start is Joe Albahari's article: http://www.albahari.com/threading/.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.