Value not preserved on thread Named Data Slot - c#

This is pretty simple. We have code like this:
var slot = Thread.GetNamedDataSlot("myslot");
Thread.SetData(slot, value);
The current code exits the thread. Eventually the thread is re-allocated for more work. We expect (according to doc and many assertions in SO) that the value will still be there in the slot. And yet, at least sometimes, it isn't. It comes up null. The ManagedThreadId is the same as the one we set the value for, but the value has gone null.
We do call some opaque third-party assemblies, but I don't think that there's any way that other code could clear that slot without knowing its name.
Any thoughts on how this could go happen? Could it be that .net destroys the thread, and later creates another one with the same id? Does a thread live for the duration of the app domain?

The answer is that threads are not forever. A thread returned to the pool might be reused, or might be discarded. Take care when leaving something in TLS on a thread, if you don't code a destructor you could have a resource leak.
Here's a post that describes the same issue: http://rocksolid.gibraltarsoftware.com/development/logging/managed-thread-ids-unique-ids-that-arent-unique

Threadpool threads do not belong to you. You're not supposed to rely on their context at all, and that includes stuff like ThreadStatic data and LocalDataStoreSlot. There's so many things the runtime can do with threadpool threads that will break your code, it's not even funny. This gets even crazier when you start using await, for example (the same method can easily execute on multiple different threads, some from the thread pool, some not...).
As an implementation detail (nothing you should rely on), the .NET runtime manages the thread pool to be as big as required. On a properly asynchronous application, this means it will only have about 1-2x the amount of CPU cores. However, if those threads become tied up, it will start creating new ones to accomodate the queued work items (unless, of course, the pool threads are actually saturating the CPU - new threads will not help in that case). When the peak load is done, it will similarly start releasing the threads.
ManagedThreadId is not unique over the scope of the AppDomain over its life-time - it is only unique in any given moment. You shouldn't rely on it being unique, especially when dealing with threadpool threads. The ID will stay the same for a given thread over it's lifetime (even if the underlying system thread changes - assuming of course the managed thread is actually implemented on top of a system thread) - when you're working with threadpool threads, though, you are not working with actual threads - you're just posting work items on the thread-pool.

Related

Sync over Async - threads usage clarification

I saw David fowler's and Demian at the NDC and they've talked about scaling.
At the beginning of the presentation they've asked the audience: " How many threads are involved here in this code : "
void Main()
{
Task.Delay(1000).Wait();
}
Then #jonskeet said: "at least 2".
The first thread is the main thread and I can assume that the second thread is the one used by Delay ( timer ), which at the end grabs another thread from the thread pool ( I hope I'm right on this one). There is no await here. So I don't think there is a state machine here.
Question
But why is there another option for another thread ? ( he said at least 2). Can someone please clarify what's the thread usage in this simple example?
But why is there another option for another thread ?
Speculation, but: we know that there is not an OS level timer per delay; instead, as an implementation detail there is a linked-list (ordered by timeout) of pending timers, and only the first node is actually scheduled to the OS.
Now imagine the OS-level timeout triggers; it needs to do multiple things:
activate the callbacks of all items with the same timeout value
schedule an OS timeout for the next item with a later timeout
book-keeping
The infrastructure code probably doesn't want one slowly written callback to delay all the others, so it almost certainly hands the callback activation to the thread-pool, rather than invoking the callback synchronously. It is possible, but not guaranteed, that the book-keeping etc will happen fast enough that the same worker thread picks up the callbacks from the pool; a more likely option is that an unrelated thread-pool thread deals with that.
So; we have
your primary thread
the thread handling the OS timeout and scheduling callbacks onto the thread-pool
the thread-pool thread picking up the callback
For a definitive answer, only Jon can answer the question, since he's the one who uttered the phrase you're asking about. Fortunately, in this case there's a real possibility he might.
That said, I would say the "at least" is mainly acknowledgement that there any number of other sources of other threads, never mind it depends on what the original question actually meant by "involved here". For example, simply accessing the thread pool could result in some minimum of threads being created immediately; they may not be used, but they could still be there.
Furthermore, .NET has for some time had a multithreaded garbage collector. So the mere fact you're dealing with a .NET program means there could be that GC thread involved. For that matter, there could also be the finalizer thread.
All that said, I would say that generally, you could expect there to just be the two threads. The thread pool by default will create threads immediately up to some maximum number, but only as needed. And in the given code example, there's not going to be any demand for garbage collection. When I run the example you show in a default .NET 5 project, I get just the two threads you'd expect:

What is the advantage of creating a thread outside threadpool?

Okay, So I wanted to know what happens when I use TaskCreationOptions.LongRunning. By this answer, I came to know that for long running tasks, I should use this options because it creates a thread outside of threadpool.
Cool. But what advantage would I get when I create a thread outside threadpool? And when to do it and avoid it?
what advantage would I get when I create a thread outside threadpool?
The threadpool, as it name states, is a pool of threads which are allocated once and re-used throughout, in order to save the time and resources necessary to allocate a thread. The pool itself re-sizes on demand. If you queue more work than actual workers exist in the pool, it will allocate more threads in 500ms intervals, one at a time (this exists to avoid allocation of multiple threads simultaneously where existing threads may already finish executing and can serve requests). If many long running operations are performed on the thread-pool, it causes "thread starvation", meaning delegates will start getting queued and ran only once a thread frees up. That's why you'd want to avoid a large amount of threads doing lengthy work with thread-pool threads.
The Managed Thread-Pool docs also have a section on this question:
There are several scenarios in which it is appropriate to create and
manage your own threads instead of using thread pool threads:
You require a foreground thread.
You require a thread to have a particular priority.
You have tasks that cause the thread to block for long periods of time. The thread pool has a maximum number of threads, so a large
number of blocked thread pool threads might prevent tasks from
starting.
You need to place threads into a single-threaded apartment. All ThreadPool threads are in the multithreaded apartment.
You need to have a stable identity associated with the thread, or to dedicate a thread to a task.
For more, see:
Thread vs ThreadPool
When should I not use the ThreadPool in .Net?
Dedicated thread or thread-pool thread?
"Long running" can be quantified pretty well, a thread that takes more than half a second is running long. That's a mountain of processor instructions on a modern machine, you'd have to burn a fat five billion of them per second. Pretty hard to do in a constructive way unless you are calculating the value of Pi to thousands of decimals in the fraction.
Practical threads can only take that long when they are not burning core but are waiting a lot. Invariably on an I/O completion, like reading data from a disk, a network, a dbase server. And often the reason you'd start considering using a thread in the first place.
The threadpool has a "manager". It determines when a threadpool thread is allowed to start. It doesn't happen immediately when you start it in your program. The manager tries to limit the number of running threads to the number of CPU cores you have. It is much more efficient that way, context switching between too many active threads is expensive. And a good throttle, preventing your program from consuming too many resources in a burst.
But the threadpool manager has the very common problem with managers, it doesn't know enough about what is going on. Just like my manager doesn't know that I'm goofing off at Stackoverflow.com, the tp manager doesn't know that a thread is waiting for something and not actually performing useful work. Without that knowledge it cannot make good decisions. A thread that does a lot of waiting should be ignored and another one should be allowed to run in its place. Actually doing real work.
Just like you tell your manager that you go on vacation, so he can expect no work to get done, you tell the threadpool manager the same thing with LongRunning.
Do note that it isn't quite a bad as it, perhaps, sounds in this answer. Particularly .NET 4.0 hired a new manager that's a lot smarter at figuring out the optimum number of running threads. It does so with a feedback loop, collecting data to discover if active threads actually get work done. And adjusts the optimum accordingly. Only problem with this approach is the common one when you close a feedback loop, you have to make it slow so the loop cannot become unstable. In other words, it isn't particularly quick at driving up the number of active threads.
If you know ahead of time that the thread is pretty abysmal, running for many seconds with no real cpu load then always pick LongRunning. Otherwise it is a tuning job, observing the program when it is done and tinkering with it to make it more optimal.

When the ThreadPool constructs new threads instead reusing them?

I read the following paragraph in the following answer from Reed Copsey:
Will values in my ThreadStatic variables still be there when cycled via ThreadPool?
The thread pool (by design) keeps the threads alive between calls.
This means that the ThreadStatic variables will persist between calls
to QueueUserWorkItem.
This behavior is also something you should not count on. The
ThreadPool will (eventually, at its discretion) release threads back
and let them end, and construct new threads as needed.
Under what conditions the threadpool eventually constructs new threads instead reusing them?
As Adriano said, this is an implementation detail you should not worry about. But, for curiosity's sake, this is the best explanation of how the ThreadPool works that I could find (from Throttling Concurrency in the CLR 4.0 ThreadPool):
To overcome some of the limitations of previous implementations, new ideas were introduced with CLR 4.0. The first methodology considered, from the control theory area, was the Hill Climbing (HC) algorithm. This technique is an auto-tuning approach based on an input-output feedback loop. The system output is monitored and measured at small time intervals to see what effects the controlled input had, and that information is fed back into the algorithm to further tune the input. Looking at the input and output as variables, the system is modeled as a function in terms of these variables.
Simply put, once in a while, the Hill Climbing algorithm,
Measures the output using the current number of threads (n).
Adds +1 thread to the pool
Measures the output using the current number of threads (n+1).
If O(n+1) > O(n)
go back to step 1;
else, go back to step 1, but this time release a thread at step 3, instead of creating a new one.
AFAIK under 'undocumented' conditions.
First and foremost consider that there are at least 4 commonly used CLR hosting providers (ASP.Net, IE, shell exes and SQLCLR) and each has its own policies. For instance SQLCLR hosting uses the SQL Server's own Thread and Task architecture and will react to OS signals of pressure by shrinking pools (all sort of pools, including threads).
So why not just assume that the thread was always reclaimed and you'll be correct (ie. don't keep state on the pool owned thread).

ManagedThreadId keeps increasing

I've noticed that in many of my services (which use multiple threads) the thread IDs keep increasing their values. Is this a sign of trouble? Am I somehow not returning them to the pool or is this value increase normal behavior?
As long as your threads are returning (and not blocking, waiting, sleeping or in an infinite loop) then you're okay. ManagedThreadId is just a unique identifier, it isn't a "thread count" at all ( http://msdn.microsoft.com/en-us/library/system.threading.thread.managedthreadid.aspx )
Thread.ManagedThreadId
An integer that represents a unique identifier for this managed thread.
To be sure your threads are returning, pause your process in the VS debugger and tell it to freeze all threads and have a look at the Threads debug window. In a runtime environment I'd modify the thread code to increment a locked integer and to decrement the same locked integer when the thread returns (use a try/finally block to ensure a thrown exception doesn't cause the integer decrement to be missed).
The 'correct' answer is no it is not normal. Not like CLR is broken. Your app should (most of the time unless you have some very good reason, of which I can't even imagine what it might be) use Thread threads carefully. If you are creating other the 100 threads you are 99% doing something wrong.
You either kill threads where you should re-use them OR you should use thread pool threads where you are using Thread threads.
EDIT OK. You might not trust me. But MSDN says the same:
The value of the ManagedThreadId property does not vary over time, even if unmanaged code that hosts the common language runtime implements the thread as a fiber.
So just to stress it again (which I haven't made clear in first attempt)... You are not seeing thread Ids changing in the existing threads. You see different threads popping up (in hundreds by your own words).... New thread gets new ID. Old thread does not change its ID.

When should I not use the ThreadPool in .Net? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
When should I not use the ThreadPool in .Net?
It looks like the best option is to use a ThreadPool, in which case, why is it not the only option?
What are your experiences around this?
#Eric, I'm going to have to agree with Dean. Threads are expensive. You can't assume that your program is the only one running. When everyone is greedy with resources, the problem multiplies.
I prefer to create my threads manually and control them myself. It keeps the code very easy to understand.
That's fine when it's appropriate. If you need a bunch of worker threads, though, all you've done is make your code more complicated. Now you have to write code to manage them. If you just used a thread pool, you'd get all the thread management for free. And the thread pool provided by the language is very likely to be more robust, more efficient, and less buggy than whatever you roll for yourself.
Thread t = new Thread(new ThreadStart(DoSomething));
t.Start();
t.Join();
I hope that you would normally have some additional code in between Start() and Join(). Otherwise, the extra thread is useless, and you're wasting resources for no reason.
People are way too afraid of the resources used by threads. I've never seen creating and starting a thread to take more than a millisecond. There is no hard limit on the number of threads you can create. RAM usage is minimal. Once you have a few hundred threads, CPU becomes an issue because of context switches, so at that point you might want to get fancy with your design.
A millisecond is a long time on modern hardware. That's 3 million cycles on a 3GHz machine. And again, you aren't the only one creating threads. Your threads compete for the CPU along with every other program's threads. If you use not-quite-too-many threads, and so does another program, then together you've used too many threads.
Seriously, don't make life more complex than it needs to be. Don't use the thread pool unless you need something very specific that it offers.
Indeed. Don't make life more complex. If your program needs multiple worker threads, don't reinvent the wheel. Use the thread pool. That's why it's there. Would you roll your own string class?
The only reason why I wouldn't use the ThreadPool for cheap multithreading is if I need to…
interract with the method running (e.g., to kill it)
run code on a STA thread (this happened to me)
keep the thread alive after my application has died (ThreadPool threads are background threads)
in case I need to change the priority of the Thread. We can not change priority of threads in ThreadPool which is by default Normal.
P.S.: The MSDN article "The Managed Thread Pool" contains a section titled, "When Not to Use Thread Pool Threads", with a very similar but slightly more complete list of possible reasons for not using the thread pool.
There are lots of reasons why you would need to skip the ThreadPool, but if you don't know them then the ThreadPool should be good enough for you.
Alternatively, look at the new Parallel Extensions Framework, which has some neat stuff in there that may suit your needs without having to use the ThreadPool.
To quarrelsome's answer, I would add that it's best not to use a ThreadPool thread if you need to guarantee that your thread will begin work immediately. The maximum number of running thread-pooled threads is limited per appdomain, so your piece of work may have to wait if they're all busy. It's called "queue user work item", after all.
Two caveats, of course:
You can change the maximum number of thread-pooled threads in code, at runtime, so there's nothing to stop you checking the current vs maximum number and upping the maximum if required.
Spinning up a new thread comes with its own time penalty - whether it's worthwhile for you to take the hit depends on your circumstances.
Thread pools make sense whenever you have the concept of worker threads. Any time you can easily partition processing into smaller jobs, each of which can be processed independently, worker threads (and therefore a thread pool) make sense.
Thread pools do not make sense when you need thread which perform entirely dissimilar and unrelated actions, which cannot be considered "jobs"; e.g., One thread for GUI event handling, another for backend processing. Thread pools also don't make sense when processing forms a pipeline.
Basically, if you have threads which start, process a job, and quit, a thread pool is probably the way to go. Otherwise, the thread pool isn't really going to help.
I'm not speaking as someone with only
theoretical knowledge here. I write
and maintain high volume applications
that make heavy use of multithreading,
and I generally don't find the thread
pool to be the correct answer.
Ah, argument from authority - but always be on the look out for people who might be on the Windows kernel team.
Neither of us were arguing with the fact that if you have some specific requirements then the .NET ThreadPool might not be the right thing. What we're objecting to is the trivialisation of the costs to the machine of creating a thread.
The significant expense of creating a thread at the raison d'etre for the ThreadPool in the first place. I don't want my machines to be filled with code written by people who have been misinformed about the expense of creating a thread, and don't, for example, know that it causes a method to be called in every single DLL which is attached to the process (some of which will be created by 3rd parties), and which may well hot-up a load of code which need not be in RAM at all and almost certainly didn't need to be in L1.
The shape of the memory hierarchy in a modern machine means that 'distracting' a CPU is about the worst thing you can possibly do, and everybody who cares about their craft should work hard to avoid it.
When you're going to perform an operation that is going to take a long time, or perhaps a continuous background thread.
I guess you could always push the amount of threads available in the pool up but there would be little point in incurring the management costs of a thread that is never going to be given back to the pool.
Threadpool threads are appropriate for tasks that meet both of the following criteria:
The task will not have to spend any significant time waiting for something to happen
Anything that's waiting for the task to finish will likely be waiting for many tasks to finish, so its scheduling priority isn't apt to affect things much.
Using a threadpool thread instead of creating a new one will save a significant but bounded amount of time. If that time is significant compared with the time it will take to perform a task, a threadpool task is likely appropriate. The longer the time required to perform a task, however, the smaller the benefit of using the threadpool and the greater the likelihood of the task impeding threadpool efficiency.
MSDN has a list some reasons here:
http://msdn.microsoft.com/en-us/library/0ka9477y.aspx
There are several scenarios in which it is appropriate to create and
manage your own threads instead of using thread pool threads:
You require a foreground thread.
You require a thread to have a particular priority.
You have tasks that cause the thread to block for long periods of time. The thread pool has a maximum number of threads, so a large
number of blocked thread pool threads might prevent tasks from
starting.
You need to place threads into a single-threaded apartment. All ThreadPool threads are in the multithreaded apartment.
You need to have a stable identity associated with the thread, or to dedicate a thread to a task.
#Eric
#Derek, I don't exactly agree with the scenario you use as an example. If you don't know exactly what's running on your machine and exactly how many total threads, handles, CPU time, RAM, etc, that your app will use under a certain amount of load, you are in trouble.
Are you the only target customer for the programs you write? If not, you can't be certain about most of that. You generally have no idea when you write a program whether it will execute effectively solo, or if it will run on a webserver being hammered by a DDOS attack. You can't know how much CPU time you are going to have.
Assuming your program's behavior changes based on input, it's rare to even know exactly how much memory or CPU time your program will consume. Sure, you should have a pretty good idea about how your program is going to behave, but most programs are never analyzed to determine exactly how much memory, how many handles, etc. will be used, because a full analysis is expensive. If you aren't writing real-time software, the payoff isn't worth the effort.
In general, claiming to know exactly how your program will behave is far-fetched, and claiming to know everything about the machine approaches ludicrous.
And to be honest, if you don't know exactly what method you should use: manual threads, thread pool, delegates, and how to implement it to do just what your application needs, you are in trouble.
I don't fully disagree, but I don't really see how that's relevant. This site is here specifically because programmers don't always have all the answers.
If your application is complex enough to require throttling the number of threads that you use, aren't you almost always going to want more control than what the framework gives you?
No. If I need a thread pool, I will use the one that's provided, unless and until I find that it is not sufficient. I will not simply assume that the provided thread pool is insufficient for my needs without confirming that to be the case.
I'm not speaking as someone with only theoretical knowledge here. I write and maintain high volume applications that make heavy use of multithreading, and I generally don't find the thread pool to be the correct answer.
Most of my professional experience has been with multithreading and multiprocessing programs. I have often needed to roll my own solution as well. That doesn't mean that the thread pool isn't useful, or appropriate in many cases. The thread pool is built to handle worker threads. In cases where multiple worker threads are appropriate, the provided thread pool should should generally be the first approach.

Categories

Resources