Threads not Garbage collected / ThreadPool threads / C#/.NET

Threads not Garbage collected / ThreadPool threads / C#/.NET - c#

In my C#/.NET 3.5 program I am using Threadpool threads ( delegate+BeginInvoke/EndInvoke) to parallelize and speed up some file loading. SystemInternals tool ProcessExplorer shows that number of threads in process is increasing over time, while I would expect to stay the same. Looks like some Threads/Threads handles stay hanging around for no reason.
Interestingly enough, I can not find pattern how threads grow and seems that happen sporadically, without repeatable pattern each time I start application. I spend some time analyzing and here are some observations:
1) code looks like this:
ArrayList IAsyncResult_s = new ArrayList();
AsyncProcessing thread1 = processRasterLayer;
... ArrayList filesToRender....
foreach (string FileName in filesToRender)
{
string fileName2 = FileName;
GeoImage partialImage1;
IAsyncResult asyncResult = thread1.BeginInvoke(
fileName2, .....,
out partialImage1, ..., null, null);
IAsyncResult_s.Add(asyncResult);
asyncResult = null;
}
.................
//block and render all
foreach (IAsyncResult asyncResult in IAsyncResult_s)
{
GeoImage partialImage1;
thread1.EndInvoke(
out partialImage1, , asyncResult);
//render image.. some calls to render partial image here
partialImage1.Dispose();
partialImage1 = null;
}
IAsyncResult_s.Clear();
IAsyncResult_s = null;
thread1 = null;
2) Number of Process Threads
My trace shows that during execution inside loop, ThreadPool.GetAvailableThreads(out workerThreads, out completionPortThreads); gives numbers like 493, 1000.
At the end of loops , , ThreadPool.GetAvailableThreads(out workerThreads, out completionPortThreads); gives numbers 500, 1000. So, number of available thread returns to same
Number of process threads reported by SystemInternals ProcessExplorer and API System.Diagnostics.Process.GetCurrentProcess().Threads.Count is the 16 before loops, and around 21 after loops.
If I call againg those loops, number of threads in process grows, but not by fixed nubmer each time, but grows 1-4 each time I repeat above code, so grows like 16->21->22->26->31...
3)Forced garbage collection didn’t htelp
I tried to froce garbage collection to get rid of those extra threads, but that didn’t removed them from process.
4)Profling tools
I was using RedGates Memory and Performace profilers, but hasen’t found obvious reason. I saw several extra threas and their object (ThreadContext etc) hanging, but saw no object holding those threads in memory. I am prety sure those extra threads were involved into loops work above, since I added thread name inside calls, and they still have that name I gave them.
5) Intelitrace
Intelitrace debuging showed also extra threads hanging. They still have names I gave them. But interestingly, it also showed that same thread that is hanging now, was used by above loop in the past, but also same thread was executing some timer related evens from timers form my code.
6) Locating issue
So, When I disable above loops that process filse Asynchroniously, and load files sequentialy, I do not have extra threads, and number of threads in my application is constant and and around 16.
7) Regarding SetMaxThreads :Here how it looks on my machine (XP, .NET 3.5):
Code like this:
ThreadPool.GetAvailableThreads(out AvailableWorkerThreads, out AvailableCompletionPortThreads);
ThreadPool.GetMaxThreads(out MaxWorkerThreads, out MaxCompletionPortThreads);
ThreadPool.GetMinThreads(out MinWorkerThreads, out MinCompletionPortThreads);
Gives result:
MinWorkerThreads:2 MaxWorkerThreads:500 MinCompletionPortThreads:2 MaxCompletionPortThreads:1000 AvailableWorkerThreads:500 AvailableCompletionPortThreads:1000
My app is using maybe 8 worker threads at the same time. I see no problem with SetMaxThreads.
8)
Functionally, I have no problems so far with this solution above. But somehow, if tools report that number of threads in my app is growing, it looks like “resource leak” of some kind, and I would like to address it. It looks like some thread handles are hanging around for no reason.
9) Here is one interesting article. It sasy that thread resources are cleaned once EndInvoke is called. I am doing so in my code. Article sasy: ..”. Because EndInvoke cleans up after the spawned thread, you must make sure that an EndInvoke is called for each BeginInvoke.” “If the thread pool thread has exited, EndInvoke does the following: It cleans up the exited thread's loose ends and disposes of its resources.” See: http://en.csharp-online.net/Asynchronous_Programming%E2%80%94BeginInvoke_EndInvoke
10) Another interesting article. Author says he had thread handle leaks because he was creating controls from non-gui thread. It is pretty elaborate article, see: http://msmvps.com/blogs/senthil/archive/2008/05/29/the-case-of-the-leaking-thread-handles.aspx
11) Another interesting article. It talks about ThreadPool.SetMinThreads property. It seems that it is not ThreadPool.SetMaxThreads but ThreadPool.SetMinThreads that enables useful control over ThradPool. This article is an eye-opener for me, and made me think about how ThreadPool works and performance problems it might cause. Article is: http_://www.dotnetperls.com/threadpool-setminthreads . Anoter similar one is : http_://www.codeproject.com/Articles/3813/NET-s-ThreadPool-Class-Behind-The-Scenes
12) Another interesting article. It is talking about throttling issue with ThreadPool. Article mentions ThreadPool limit of 2 new threads per second increase. See http_://social. msdn. microsoft. com/forums/en-US/clr/thread/3325cb32-371b-4f3e-965f-6ca88538dc3e/
13) So, in maybe 30 tests I saw only 2 times that number of threads allocated would shrink. But, it did happen. I saw once thread number going like 16->....->31->61-> ->30->16. So, it went back to 16. It doesn’t happen often, and it is not about time waited, it was like big activity in process, followed by a period of constant low level activity.
14) ThreadPool.SetMinThreads Method documentation. It talks about 2 new threads per second limit for threadpool. It is not clear if setting this property would remove that limit. http_://msdn.microsoft. com/en-ca/library/system. threading.threadpool.setminthreads(v=vs.90).aspx

So the answer is: there's no leak here. This is how the thread pool works. It keeps around threads that finished working so you don't have to pay the price of thread creation next time you need one. If you have many concurrent work items then the number of threads in the pool will increase but they'll max out at MaxWorkerThreads. (And it has nothing to do with the garbage collector.)
See this article for more info:
http://msdn.microsoft.com/en-us/library/0ka9477y.aspx

i would consider a consumer producer pattern. the idea behind a threadpool is to recycle threads, not create hundreds of new. in best case you have for each cpu one thread, and queue the work. this will be sure faster as you avoid useless context switches and waits for creating new threads, as far as i remember the net threadpool waits about one second until a new thread is created, to give other threads a chance to get recycled.

Related

WithDegreeOfParallelism(N>CPU count)

System.Threading.ThreadPool.SetMaxThreads(50, 50);
File.ReadLines().AsParallel().WithDegreeOfParallelism(100).ForAll((s)->{
/*
some code which is waiting external API call
and do not utilize CPU
*/
});
I have never got threads count more than CPU count in my system.
Can I use PLINQ and get more than one thread per CPU?

If you're calling external web API, you might be hitting the limit of concurrent simultaneous connections, which is set to 2. In the begining of your application do the following:
System.Net.ServicePointManager.DefaultConnectionLimit = 4096;
System.Net.ServicePointManager.Expect100Continue = false;
Try if that helps. If not, there might be some other bottleneck within the routine you're trying to parallelize.
Also, just like other responders said, ThreadPool decides how many threads to spin up based on load. In my experience with TPL I've seen that thread cound increases by time: longer the app runs, and heavier load gets, more threads are spun up.

PLINQ uses a hill-climbing algorithm to determine the optimum size of the thread pool which is used by the TPL. I think that if you put a lot of I/O in your tasks, seeing more threads than the cpu count is likeable.
That said, I've never seen more threads than the cpu count :) . But maybe I never had the right situation.

I tested this with the following code:
var lines = Enumerable.Range(0, 200).ToArray();
int currentThreads = 0;
int maxThreads = 0;
object l = new object();
lines.AsParallel().WithDegreeOfParallelism(100).ForAll(
s =>
{
lock (l)
{
currentThreads++;
if (currentThreads > maxThreads)
{
maxThreads = currentThreads;
Console.WriteLine(maxThreads);
}
}
Thread.Sleep(3000);
lock (l)
{
currentThreads--;
}
});
Console.WriteLine();
Console.WriteLine(maxThreads);
Basically, it records the current number of concurrently executing iterations and then saves the maximum encountered value.
The results vary quite a bit, between 15 and 25, but it's always much more than the number of CPUs my computer has (4). Increasing the sleep time increases the maximum number of concurrent threads. So it looks like the limiting factor here is the ThreadPool: it will create new threads slowly, especially when jobs are being completed relatively quickly.
If you want to increase the number of threads used, you would need to use SetMinThreads() (not SetMaxThreads()). If I set the minimum to 50, the number of threads actually used is around 60.
But having dozens of threads that do nothing but wait is quite inefficient, especially when it comes to memory consumption. You should consider using asynchronous methods instead.

PLINQ does not fit in this case.
I have found next article useful for me.
http://msdn.microsoft.com/en-us/library/hh228609(v=vs.110).aspx

Short answer: nope.
The amount of threading is simply up to the .Net Framework runtime. There is no developer control for controlling the number of threads for TPL (Task Parallel Library) usage.
EDIT
Thanks to some other feedback: it is actually possible--but not recommended--to manually control the number of threads in the ThreadPool, which PLINQ and TPL use.
It's my opinion that any parallelization problem needs to be carefully thought out, and carefully constructed and tested. There's a lot of subtlety in this.

should I use thread affinity for "latency-critical" threads?

In my HFT trading application I have several places where I receive data from network. In most cases this is just a thread that only receives and process data. Below is part of such processing:
public Reciver(IPAddress mcastGroup, int mcastPort, IPAddress ipSource)
{
thread = new Thread(ReceiveData);
s = new Socket(AddressFamily.InterNetwork, SocketType.Dgram, ProtocolType.Udp);
s.ReceiveBufferSize = ReceiveBufferSize;
var ipPort = new IPEndPoint(LISTEN_INTERFACE/* IPAddress.Any*/, mcastPort);
s.Bind(ipPort);
option = new byte[12];
Buffer.BlockCopy(mcastGroup.GetAddressBytes(), 0, option, 0, 4);
Buffer.BlockCopy(ipSource.GetAddressBytes(), 0, option, 4, 4);
Buffer.BlockCopy(/*IPAddress.Any.GetAddressBytes()*/LISTEN_INTERFACE.GetAddressBytes(), 0, option, 8, 4);
}
public void ReceiveData()
{
byte[] byteIn = new byte[4096];
while (needReceive)
{
if (IsConnected)
{
int count = 0;
try
{
count = s.Receive(byteIn);
}
catch (Exception e6)
{
Console.WriteLine(e6.Message);
Log.Push(LogItemType.Error, e6.Message);
return;
}
if (count > 0)
{
OnNewMessage(new NewMessageEventArgs(byteIn, count));
}
}
}
}
This thread works forever once created. I just wonder if I should configure this thread to run on certain core? As I need lowest latency I want to avoid context switch. As I want to avoid context switch I better to run the same thread on the same processor core, right?
Taking into account that i need lowest latency is that correct that:
It would be better to set "thread afinity" for the most part of the "long-running" threads?
It would be better to set "thread afinity" for the thread from my example above?
I rewriting above code to c++ right now to port to Linux later if this is important however I assume that my question is more about hardware than language or OS.

I think the algorithm that has as little latency as possible would be to pin your threads to one core and set them to realtime priority (or whatever is the highest one).
This will cause the OS to evict any other thread which happens to use that core.
Hopefully the CPU cache will still contain useful data when your thread gets scheduled there. For that reason I like the idea of pinning to a core.
You should probably set your entire process to a high priority class and minimize other activity on your box. Also turn off unused hardware because it might generate interrupts. Fix your NIC's interrupts to a different CPU core (some better NICs can do that).

As I want to avoid context switch I better to run the same thread on the same processor core, right?
No. A context switch will not necessarily be avoided by setting affinity to one CPU. You have no control over context switches, they are in the hands of the OS thread scheduler. They occur when a thread quantum (time slice) has elapsed or when a higher priority thread interrupts your thread.
Latency you talk about, I assume is network or memory latency, is not at all avoided by setting thread affinity. Memory latency can be avoided by making your code cache friendly (ie it can all be in the L1 - L2 caches, for example). Network latency is really just part of any network, and not something I suspect you can do much about.

As Tony The Lion has already answered your question, I would like to address your comment:
"why not setting thread afinity to my code? why thread from my example need to travel between cores?"
Your thread doesn't travel anywhere.
Context switch happens when OS thread scheduler decides to give your thread a slice of time to execute. Then the environment is prepared for your thread, e.g. the CPU registers
are set up to correct values etc. This is called context switch.
So regardless of thread affinity, the same CPU setup work has to be done, whether it is the same CPU/core which was used in previous slice when your thread was running or another one. And at this moments, your computer has more info to do it properly then you do at compile time.
You seem to believe that thread somehow resides on the CPU, but it is not so. What you use is a logical thread and there can be hundreds or even thousands of them. Common CPUs, OTOH, usually have 1 or 2 hardware threads per core, and your logical thread gets mapped to one of these every time it is scheduled, even if OS always picks the same HW thread.
EDIT: it seems that you have already picked the answer you want to hear and I don't like long discussion threads on answers so I will put it here.
you should try and measure it. I believe that you will be dissapointed
running some threads on high priority thread might easily mess up other processes
you are worried about context switch latency, but you have no problems that GC thread will freeze your thread? BTW, on which core will your GC thread run? :)
what if your highest priority thread blocks GC thread? memory leaks? do you know what is priority of that thread so you are sure it would work?
really, why not C or hand optimized assembly if microseconds are important?
as someone suggested, you should use an RTOS if you want to control this aspect of execution
it doesn't seem likely that your data travels through data center just 4-5 times slower than it takes to setup a thread context on one machine, but who knows...

Purpose of Thread.Sleep(1)?

I was reading over some threading basics and on the msdn website I found this snippet of code.
// Put the main thread to sleep for 1 millisecond to
// allow the worker thread to do some work:
Thread.Sleep(1);
Here is a link to the the page: http://msdn.microsoft.com/en-us/library/7a2f3ay4(v=vs.80).aspx.
Why does the main thread have sleep for 1 millisecond? Will the secondary thread not start its tasks if the main thread is continuously running? Or is the example meant for a task that takes 1 millisecond to do? As in if the task generally takes 5 seconds to complete the main thread should sleep for 5000 milliseconds?
If this is solely regarding CPU usage, here is a similar Question about Thread.Sleep.
Any comments would be appreciated.
Thanks.

The 1 in that code is not terribly special; it will always end up sleeping longer than that, as things aren't so precise, and giving up your time slice does not equal any guarantee from the OS when you will get it back.
The purpose of the time parameter in Thread.Sleep() is that your thread will yield for at least that amount of time, roughly.
So that code is just explicitly giving up its time slot. Generally speaking, such a bit of code should not be needed, as the OS will manage your threads for you, preemptively interrupting them to work on other threads.
This kind of code is often used in "threading examples", where the writer wants to force some artificial occurrence to prove some race condition, or the like (that appears to be the case in your example)
As noted in Jon Hanna's answer to this same question, there is a subtle but important difference between Sleep(0) and Sleep(1) (or any other non-zero number), and as ChrisF alludes to, this can be important in some threading situations.
Both of those involve thread priorities; Threads can be given higher/lower priorities, such that lower priority threads will never execute as long as there are higher priority threads that have any work to do. In such a case, Sleep(1) can be required... However...
Low-priority threads are also subject to what other processes are doing on the same system; so while your process might have no higher-priority threads running, if any others do, yours still won't run.
This isn't usually something you ever need to worry about, though; the default priority is the 'normal' priority, and under most circumstances, you should not change it. Raising or lowering it has numerous implications.

Thread.Sleep(0) will give up the rest of a thread's time-slice if a thread of equal priority is ready to schedule.
Thread.Sleep(1) (or any other value, but 1 is the lowest to have this effect) will give up the rest of the thread's time-slice unconditionally. If it wants to make sure that even threads with lower priority have a chance to run (and such a thread could be doing something that is blocking this thread, it has to), then it's the one to go for.
http://www.bluebytesoftware.com/blog/PermaLink,guid,1c013d42-c983-4102-9233-ca54b8f3d1a1.aspx has more on this.

If the main thread doesn't sleep at all then the other threads will not be able to run at all.
Inserting a Sleep of any length allows the other threads some processing time. Using a small value (of 1 millisecond in this case) means that the main thread doesn't appear to lock up. You can use Sleep(0), but as Jon Hanna points out that has a different meaning to Sleep(1) (or indeed any positive value) as it only allows threads of equal priority to run.
If the task takes 5 seconds then the main thread will sleep for a total of 5,000 milliseconds, but spread out over a longer period.

It's only for the sake of the example- they want to make sure that the worker thread has the chance to print "worker thread: working..." at least once before the main thread kills it.
As Andrew implied, this is important in the example especially because if you were running on a single-processor machine, the main thread may not give up the processor, killing the background thread before it has a chance to iterate even once.

Interesting thing I noticed today. Interrupting a thread throws a ThreadInterruptedException. I was trying to catch the exception but could not for some reason. My coworker recommended that I put Thread.Sleep(1) prior to the catch statement and that allowed me to catch the ThreadInterruptedException.
// Start the listener
tcpListener_ = new TcpListener(ipAddress[0], int.Parse(portNumber_));
tcpListener_.Start();
try
{
// Wait for client connection
while (true)
{
// Wait for the new connection from the client
if (tcpListener_.Pending())
{
socket_ = tcpListener_.AcceptSocket();
changeState(InstrumentState.Connected);
readSocket();
}
Thread.Sleep(1);
}
}
catch (ThreadInterruptedException) { }
catch (Exception ex)
{
MessageBox.Show(ex.Message, "Contineo", MessageBoxButtons.OK, MessageBoxIcon.Error);
Console.WriteLine(ex.StackTrace);
}
Some other class...
if (instrumentThread_ != null)
{
instrumentThread_.Interrupt();
instrumentThread_ = null;
}

WaitAll limitation

I heard there is limitation when using waitall on multiple threads (# of threads to wait?). Can anyone give details?

I think the restriction you are referring to is not on the number of threads; it is on the number of handles being waited on. From the MSDN page for WaitHandle.WaitAll(WaitHandle[]):
On some implementations, if more than
64 handles are passed, a
NotSupportedException is thrown.
On the rare occasion that this issue has cropped, I have normally worked around it with:
WaitHandle[] handles = ...
foreach(var waitHandle in handles)
waitHandle.WaitOne();
For completeness, the other restrictions appear to be:
If the array contains duplicates, the
call fails with a
DuplicateWaitObjectException.
The WaitAll method is not supported on
threads that have STAThreadAttribute.

Are you thinking of the STA (single-threaded apartment) limitation of a winform app?
If so, I handle this by simply checking if the 'work queue' is empty after each thread has done it's processing, and calling .WaitOne() on a single ManualResetEvent object that the main thread owns instead of using .WaitAll() at all.
Like this:
moSolverEvent = new ManualResetEvent(false);
ProcessResult(new SolverWorkInProgress());
//Wait here until the last background thread reports in
moSolverEvent.WaitOne();
And then the worker threads are doing this:
if (mhSolverWorkQueue.Count == 0) moSolverEvent.Set();
It works spectacularly well, and avoids any issues with WaitAll(), even in a WinForms app. After all, you're not really waiting for the threads to be done... you're waiting for the WORK to be done. :-)
Just be sure to do the appropriate locking on each of these objects so your threads don't step all over each other.

C# thread pool limiting threads

Alright...I've given the site a fair search and have read over many posts about this topic. I found this question: Code for a simple thread pool in C# especially helpful.
However, as it always seems, what I need varies slightly.
I have looked over the MSDN example and adapted it to my needs somewhat. The example I refer to is here: http://msdn.microsoft.com/en-us/library/3dasc8as(VS.80,printer).aspx
My issue is this. I have a fairly simple set of code that loads a web page via the HttpWebRequest and WebResponse classes and reads the results via a Stream. I fire off this method in a thread as it will need to executed many times. The method itself is pretty short, but the number of times it needs to be fired (with varied data for each time) varies. It can be anywhere from 1 to 200.
Everything I've read seems to indicate the ThreadPool class being the prime candidate. Here is what things get tricky. I might need to fire off this thing say 100 times, but I can only have 3 threads at most running (for this particular task).
I've tried setting the MaxThreads on the ThreadPool via:
ThreadPool.SetMaxThreads(3, 3);
I'm not entirely convinced this approach is working. Furthermore, I don't want to clobber other web sites or programs running on the system this will be running on. So, by limiting the # of threads on the ThreadPool, can I be certain that this pertains to my code and my threads only?
The MSDN example uses the event drive approach and calls WaitHandle.WaitAll(doneEvents); which is how I'm doing this.
So the heart of my question is, how does one ensure or specify a maximum number of threads that can be run for their code, but have the code keep running more threads as the previous ones finish up until some arbitrary point? Am I tackling this the right way?
Sincerely,
Jason
Okay, I've added a semaphore approach and completely removed the ThreadPool code. It seems simple enough. I got my info from: http://www.albahari.com/threading/part2.aspx
It's this example that showed me how:
[text below here is a copy/paste from the site]
A Semaphore with a capacity of one is similar to a Mutex or lock, except that the Semaphore has no "owner" – it's thread-agnostic. Any thread can call Release on a Semaphore, while with Mutex and lock, only the thread that obtained the resource can release it.
In this following example, ten threads execute a loop with a Sleep statement in the middle. A Semaphore ensures that not more than three threads can execute that Sleep statement at once:
class SemaphoreTest
{
static Semaphore s = new Semaphore(3, 3); // Available=3; Capacity=3
static void Main()
{
for (int i = 0; i < 10; i++)
new Thread(Go).Start();
}
static void Go()
{
while (true)
{
s.WaitOne();
Thread.Sleep(100); // Only 3 threads can get here at once
s.Release();
}
}
}

Note: if you are limiting this to "3" just so you don't overwhelm the machine running your app, I'd make sure this is a problem first. The threadpool is supposed to manage this for you. On the other hand, if you don't want to overwhelm some other resource, then read on!
You can't manage the size of the threadpool (or really much of anything about it).
In this case, I'd use a semaphore to manage access to your resource. In your case, your resource is running the web scrape, or calculating some report, etc.
To do this, in your static class, create a semaphore object:
System.Threading.Semaphore S = new System.Threading.Semaphore(3, 3);
Then, in each thread, you do this:
System.Threading.Semaphore S = new System.Threading.Semaphore(3, 3);
try
{
// wait your turn (decrement)
S.WaitOne();
// do your thing
}
finally {
// release so others can go (increment)
S.Release();
}
Each thread will block on the S.WaitOne() until it is given the signal to proceed. Once S has been decremented 3 times, all threads will block until one of them increments the counter.
This solution isn't perfect.
If you want something a little cleaner, and more efficient, I'd recommend going with a BlockingQueue approach wherein you enqueue the work you want performed into a global Blocking Queue object.
Meanwhile, you have three threads (which you created--not in the threadpool), popping work out of the queue to perform. This isn't that tricky to setup and is very fast and simple.
Examples:
Best threading queue example / best practice
Best method to get objects from a BlockingQueue in a concurrent program?

It's a static class like any other, which means that anything you do with it affects every other thread in the current process. It doesn't affect other processes.
I consider this one of the larger design flaws in .NET, however. Who came up with the brilliant idea of making the thread pool static? As your example shows, we often want a thread pool dedicated to our task, without having it interfere with unrelated tasks elsewhere in the system.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.