C# thread pool limiting threads - c#

Alright...I've given the site a fair search and have read over many posts about this topic. I found this question: Code for a simple thread pool in C# especially helpful.
However, as it always seems, what I need varies slightly.
I have looked over the MSDN example and adapted it to my needs somewhat. The example I refer to is here: http://msdn.microsoft.com/en-us/library/3dasc8as(VS.80,printer).aspx
My issue is this. I have a fairly simple set of code that loads a web page via the HttpWebRequest and WebResponse classes and reads the results via a Stream. I fire off this method in a thread as it will need to executed many times. The method itself is pretty short, but the number of times it needs to be fired (with varied data for each time) varies. It can be anywhere from 1 to 200.
Everything I've read seems to indicate the ThreadPool class being the prime candidate. Here is what things get tricky. I might need to fire off this thing say 100 times, but I can only have 3 threads at most running (for this particular task).
I've tried setting the MaxThreads on the ThreadPool via:
ThreadPool.SetMaxThreads(3, 3);
I'm not entirely convinced this approach is working. Furthermore, I don't want to clobber other web sites or programs running on the system this will be running on. So, by limiting the # of threads on the ThreadPool, can I be certain that this pertains to my code and my threads only?
The MSDN example uses the event drive approach and calls WaitHandle.WaitAll(doneEvents); which is how I'm doing this.
So the heart of my question is, how does one ensure or specify a maximum number of threads that can be run for their code, but have the code keep running more threads as the previous ones finish up until some arbitrary point? Am I tackling this the right way?
Sincerely,
Jason
Okay, I've added a semaphore approach and completely removed the ThreadPool code. It seems simple enough. I got my info from: http://www.albahari.com/threading/part2.aspx
It's this example that showed me how:
[text below here is a copy/paste from the site]
A Semaphore with a capacity of one is similar to a Mutex or lock, except that the Semaphore has no "owner" – it's thread-agnostic. Any thread can call Release on a Semaphore, while with Mutex and lock, only the thread that obtained the resource can release it.
In this following example, ten threads execute a loop with a Sleep statement in the middle. A Semaphore ensures that not more than three threads can execute that Sleep statement at once:
class SemaphoreTest
{
static Semaphore s = new Semaphore(3, 3); // Available=3; Capacity=3
static void Main()
{
for (int i = 0; i < 10; i++)
new Thread(Go).Start();
}
static void Go()
{
while (true)
{
s.WaitOne();
Thread.Sleep(100); // Only 3 threads can get here at once
s.Release();
}
}
}

Note: if you are limiting this to "3" just so you don't overwhelm the machine running your app, I'd make sure this is a problem first. The threadpool is supposed to manage this for you. On the other hand, if you don't want to overwhelm some other resource, then read on!
You can't manage the size of the threadpool (or really much of anything about it).
In this case, I'd use a semaphore to manage access to your resource. In your case, your resource is running the web scrape, or calculating some report, etc.
To do this, in your static class, create a semaphore object:
System.Threading.Semaphore S = new System.Threading.Semaphore(3, 3);
Then, in each thread, you do this:
System.Threading.Semaphore S = new System.Threading.Semaphore(3, 3);
try
{
// wait your turn (decrement)
S.WaitOne();
// do your thing
}
finally {
// release so others can go (increment)
S.Release();
}
Each thread will block on the S.WaitOne() until it is given the signal to proceed. Once S has been decremented 3 times, all threads will block until one of them increments the counter.
This solution isn't perfect.
If you want something a little cleaner, and more efficient, I'd recommend going with a BlockingQueue approach wherein you enqueue the work you want performed into a global Blocking Queue object.
Meanwhile, you have three threads (which you created--not in the threadpool), popping work out of the queue to perform. This isn't that tricky to setup and is very fast and simple.
Examples:
Best threading queue example / best practice
Best method to get objects from a BlockingQueue in a concurrent program?

It's a static class like any other, which means that anything you do with it affects every other thread in the current process. It doesn't affect other processes.
I consider this one of the larger design flaws in .NET, however. Who came up with the brilliant idea of making the thread pool static? As your example shows, we often want a thread pool dedicated to our task, without having it interfere with unrelated tasks elsewhere in the system.

Related

Possible Race condition with ManualResetEvent

Problem:
I am trying to throw 6 threads from ThreadPool to work on individual tasks. Each task's ManualResetEvent is stored in a array of manual reset event. Number of thread corresponds to the index in the ManualResetEvent Array.
Now what happens is that once I have initiated these 6 threads I move out and wait for the threads to complete. Waiting for the thread is done in the main thread.
Now some times what happens is that my waiting logic doesn't return even after a long time (2 days that I have seen). Here is the code sample for thread wait logic
foreach (ManualResetEvent whandle in eventList)
{
try
{
whandle.WaitOne();
}
catch (Exception) { }
}
As per documentation of .WaitOne. It is sync call which makes the thread to not return if Set event is not received from the thread.
Sometimes my threads have less amount of work and they may even return before I reach the Wait logic. Is it possible that .WaitOne() will wait for the Set() event even if it was received in the past?
Is this a correct logic to wait for the all the threads to close?
I'm not directly answering this question. Here is what you should do:
Start tasks using Task.Factory.StartNew and use Task.WaitAll(Task[]) to wait for them. You do not have to deal with events that way. Exceptions will nicely propagate to the "forking" thread. You don't need the old ThreadPool API anymore.
Hope this helps.
(Note: I think your best bet is Parallel.Invoke() - see later in this answer.)
What you are doing will normally work fine, so the problem is likely to be that one of your threads is blocking for some reason.
You should be able to debug this readily enough - you can attach the debugger and break into the program and then look at the call stack to see which thread(s) are blocked. Be prepared for some head-scratching if you discover a race condition though!
Another thing to be aware of that you can't do the following:
myEvent.Set();
myEvent.Reset();
with nothing (or very little) between the .Set() and the .Reset(). If you do that when several threads are waiting on myEvent, some of them will miss the event being set! (This effect is not well documented on MSDN.)
By the way, you shouldn't ignore exceptions - always log them in some way, at the very least.
(This section doesn't answer the question, but it may provide some helpful information)
I also want to mention an alternative way to wait for the threads. Since you have a set of ManualResetEvents, you can copy them to a plain array and pass it to WaitHandle.WaitAll().
Your code could look a little like this:
WaitHandle.WaitAll(eventList.ToArray());
Another approach to waiting for all threads to finish is to use a CountdownEvent. It becomes signalled when a countdown reaches zero; you start the count at the number of threads, and each thread signals it when it exits. There's an example here.
Parallel.Invoke()
If your threads do not return values, and all you want to to is to launch them and then have the launching thread wait for them to exit, then I think Parallel.Invoke() will be the best way of all. It avoids you having to handle the synchronization yourself.
(Otherwise, as svick says in the comments above, use Task rather than the old thread classes.)

Threads not Garbage collected / ThreadPool threads / C#/.NET

In my C#/.NET 3.5 program I am using Threadpool threads ( delegate+BeginInvoke/EndInvoke) to parallelize and speed up some file loading. SystemInternals tool ProcessExplorer shows that number of threads in process is increasing over time, while I would expect to stay the same. Looks like some Threads/Threads handles stay hanging around for no reason.
Interestingly enough, I can not find pattern how threads grow and seems that happen sporadically, without repeatable pattern each time I start application. I spend some time analyzing and here are some observations:
1) code looks like this:
ArrayList IAsyncResult_s = new ArrayList();
AsyncProcessing thread1 = processRasterLayer;
... ArrayList filesToRender....
foreach (string FileName in filesToRender)
{
string fileName2 = FileName;
GeoImage partialImage1;
IAsyncResult asyncResult = thread1.BeginInvoke(
fileName2, .....,
out partialImage1, ..., null, null);
IAsyncResult_s.Add(asyncResult);
asyncResult = null;
}
.................
//block and render all
foreach (IAsyncResult asyncResult in IAsyncResult_s)
{
GeoImage partialImage1;
thread1.EndInvoke(
out partialImage1, , asyncResult);
//render image.. some calls to render partial image here
partialImage1.Dispose();
partialImage1 = null;
}
IAsyncResult_s.Clear();
IAsyncResult_s = null;
thread1 = null;
2) Number of Process Threads
My trace shows that during execution inside loop, ThreadPool.GetAvailableThreads(out workerThreads, out completionPortThreads); gives numbers like 493, 1000.
At the end of loops , , ThreadPool.GetAvailableThreads(out workerThreads, out completionPortThreads); gives numbers 500, 1000. So, number of available thread returns to same
Number of process threads reported by SystemInternals ProcessExplorer and API System.Diagnostics.Process.GetCurrentProcess().Threads.Count is the 16 before loops, and around 21 after loops.
If I call againg those loops, number of threads in process grows, but not by fixed nubmer each time, but grows 1-4 each time I repeat above code, so grows like 16->21->22->26->31...
3)Forced garbage collection didn’t htelp
I tried to froce garbage collection to get rid of those extra threads, but that didn’t removed them from process.
4)Profling tools
I was using RedGates Memory and Performace profilers, but hasen’t found obvious reason. I saw several extra threas and their object (ThreadContext etc) hanging, but saw no object holding those threads in memory. I am prety sure those extra threads were involved into loops work above, since I added thread name inside calls, and they still have that name I gave them.
5) Intelitrace
Intelitrace debuging showed also extra threads hanging. They still have names I gave them. But interestingly, it also showed that same thread that is hanging now, was used by above loop in the past, but also same thread was executing some timer related evens from timers form my code.
6) Locating issue
So, When I disable above loops that process filse Asynchroniously, and load files sequentialy, I do not have extra threads, and number of threads in my application is constant and and around 16.
7) Regarding SetMaxThreads :Here how it looks on my machine (XP, .NET 3.5):
Code like this:
ThreadPool.GetAvailableThreads(out AvailableWorkerThreads, out AvailableCompletionPortThreads);
ThreadPool.GetMaxThreads(out MaxWorkerThreads, out MaxCompletionPortThreads);
ThreadPool.GetMinThreads(out MinWorkerThreads, out MinCompletionPortThreads);
Gives result:
MinWorkerThreads:2 MaxWorkerThreads:500 MinCompletionPortThreads:2 MaxCompletionPortThreads:1000 AvailableWorkerThreads:500 AvailableCompletionPortThreads:1000
My app is using maybe 8 worker threads at the same time. I see no problem with SetMaxThreads.
8)
Functionally, I have no problems so far with this solution above. But somehow, if tools report that number of threads in my app is growing, it looks like “resource leak” of some kind, and I would like to address it. It looks like some thread handles are hanging around for no reason.
9) Here is one interesting article. It sasy that thread resources are cleaned once EndInvoke is called. I am doing so in my code. Article sasy: ..”. Because EndInvoke cleans up after the spawned thread, you must make sure that an EndInvoke is called for each BeginInvoke.” “If the thread pool thread has exited, EndInvoke does the following: It cleans up the exited thread's loose ends and disposes of its resources.” See: http://en.csharp-online.net/Asynchronous_Programming%E2%80%94BeginInvoke_EndInvoke
10) Another interesting article. Author says he had thread handle leaks because he was creating controls from non-gui thread. It is pretty elaborate article, see: http://msmvps.com/blogs/senthil/archive/2008/05/29/the-case-of-the-leaking-thread-handles.aspx
11) Another interesting article. It talks about ThreadPool.SetMinThreads property. It seems that it is not ThreadPool.SetMaxThreads but ThreadPool.SetMinThreads that enables useful control over ThradPool. This article is an eye-opener for me, and made me think about how ThreadPool works and performance problems it might cause. Article is: http_://www.dotnetperls.com/threadpool-setminthreads . Anoter similar one is : http_://www.codeproject.com/Articles/3813/NET-s-ThreadPool-Class-Behind-The-Scenes
12) Another interesting article. It is talking about throttling issue with ThreadPool. Article mentions ThreadPool limit of 2 new threads per second increase. See http_://social. msdn. microsoft. com/forums/en-US/clr/thread/3325cb32-371b-4f3e-965f-6ca88538dc3e/
13) So, in maybe 30 tests I saw only 2 times that number of threads allocated would shrink. But, it did happen. I saw once thread number going like 16->....->31->61-> ->30->16. So, it went back to 16. It doesn’t happen often, and it is not about time waited, it was like big activity in process, followed by a period of constant low level activity.
14) ThreadPool.SetMinThreads Method documentation. It talks about 2 new threads per second limit for threadpool. It is not clear if setting this property would remove that limit. http_://msdn.microsoft. com/en-ca/library/system. threading.threadpool.setminthreads(v=vs.90).aspx
So the answer is: there's no leak here. This is how the thread pool works. It keeps around threads that finished working so you don't have to pay the price of thread creation next time you need one. If you have many concurrent work items then the number of threads in the pool will increase but they'll max out at MaxWorkerThreads. (And it has nothing to do with the garbage collector.)
See this article for more info:
http://msdn.microsoft.com/en-us/library/0ka9477y.aspx
i would consider a consumer producer pattern. the idea behind a threadpool is to recycle threads, not create hundreds of new. in best case you have for each cpu one thread, and queue the work. this will be sure faster as you avoid useless context switches and waits for creating new threads, as far as i remember the net threadpool waits about one second until a new thread is created, to give other threads a chance to get recycled.

TPL Dataflow local storage or something like it

What I'm trying to accomplish is I have a action block with MaxDegreeOfParallelism = 4. I want to create one local instance of a session object I have for each parallel path, So I want to total of 4 session objects. If this was threads I would creating something like:
ThreadLocal<Session> sessionPerThread = new ThreadLocal<Session>(() => new Session());
I know blocks are not threads so I'm looking for something similar but for blocks. Any way to create this?
This block is in a service and runs for months on end. During that time period tons of threads are used for each concurrent slot of the block so thread local storage is not appropriate. I need something tied to the logical block slot. Also this block never completes, it runs the entire lifetime of the service.
Note: The above suggested answer is not valid for what I am asking. I'm specifically asking for something different than thread local and the above answer is using thread local. This is a different question entirely.
As it sounds like you already know, Dataflow blocks provide absolutely no guarantee of correlation between blocks, execution, and threads. Even with max parallelism set to 4, all 4 tasks could be executing on the same thread. Or an individual task may execute on many threads.
Given that you ultimately want to reuse n instances of an expensive service for your n degrees of parallelism, let's take dataflow completely out of the picture for a minute, since it doesn't help (or directly hinder) you from any general solution to this problem. It's actually quite simple. You can use a ConcurrentStack<T>, where T is the type of your service that is expensive to instantiate. You have code that appears at the top of the method (or delegate) that represents one of your parallel units of work:
private ConcurrentStack<T> reusableServices;
private void DoWork() {
T service;
if (!this.reusableServices.TryPop(out service)) {
service = new T(); // expensive construction
}
// Use your shared service.
//// Code here.
// Put the service back when we're done with it so someone else can use it.
this.reusableServices.Push(service);
}
Now in this way, you can quickly see that you create exactly as many instances of your expensive service as you have parallel executions of DoWork(). You don't even have to hard-code the degree of parallelism you expect. And it's orthogonal to how you actually schedule that parallelism (so threadpool, Dataflow, PLINQ, etc. doesn't matter).
So you can just use DoWork() as your Dataflow block's delegate and you're set to go.
Of course, there's nothing magical about ConcurrentStack<T> here, except that the locks around push and pop are built into the type so you don't have to do it yourself.

What negative effect will happen when multiple threads write to the same file?

The following is pseudo code of an integration test that is failing:
[Test]
void TestLogger()
// Init static logger
Logger.Init(pathToFile);
// Create five threads that will call LogMessages delegate
for i = 1 to 5
{
Thread aThread = new Thread(LogMessages)
aThread.Start(i);
}
// let threads complete their work
Thread.Sleep(30000);
/// read from log file and count the lines
int lineCount = GetLineCount();
// 5 threads, each logs 5 times = 25 lines in the log
Assert.Equal(25, lineCount);
static void LogMessages( object data )
// each thread will log five messages
for i = 1 to 5
Logger.LogMessage(i.ToString() + " " + DateTime.Now.Ticks.ToString());
Thread.Sleep(50);
The number of lines seem to change each time the test is run. Sometimes the number of lines is 23 and sometimes 25.
After I dug through the code a little bit, I noticed that the log file is getting accessed at the same time by multiple threads (verified by tick count being the same). There is no lock around the access to this file, but at the same time I see no exceptions being thrown. Can anyone explain why the log line count between runs is inconsistent? Furthermore, is this a negative effect of writing to the same file by multiple threads at the same time?
Race conditions, as you have here, are notoriously unpredictable. You never know how many writes won't work properly - it might even work completeely fine, sometimes. And yes, this is a negative effect of writing to the same file from multiple threads without synchronization.
If you're trying to verify simultaneity with Environment.TickCount, you're going to have a bad time. It only has accuracy of about 15 milliseconds (IIRC), so if the values are the same from two threads then all you really know is that they were logging within ~15ms of each other.
If your Logger class puts a lock around its access to the log file, that should be sufficient. Just create a sync object via private static readonly object sync = new object(); and then do lock (sync) { ... open/read/write the file ... }. Otherwise you'll be at the mercy of the thread safety of whatever type of Stream you're using (hint: in general they're not thread safe).
Perhaps the simplest way to handle these sorts of producer/consumer deadlocks and other race conditions is to invoke the lock() builtin:
lock(Logger){
//use your logger here
}
This will hold the other threads at bay. You can also use the sync lock style as mentioned briefly above. There's a good example of all the options available (and pros and cons) on this guy's site:
http://www.gavindraper.co.uk/2012/02/05/thread-synchronizationlocking-in-net/
Good luck.
Typically, multiThreaded logging is performed by queueing log entries, (blocking producer-consumer queue), to one logger thread that writes to the disk. This keeps the lock time down to the time taken to push a *log entry onto the queue: next-to-no contention. The queue absorbs any disk latency, network delays etc.
Using a simple lock inflicts any disk/network latency/delay upon all calling threads.

Implementing multithreading in C# (code review)

Greetings.
I'm trying to implement some multithreaded code in an application. The purpose of this code is to validate items that the database gives it. Validation can take quite a while (a few hundred ms to a few seconds), so this process needs to be forked off into its own thread for each item.
The database may give it 20 or 30 items a second in the beginning, but that begins to decline rapidly, eventually reaching about 65K items over 24 hours, at which point the application exits.
I'd like it if anyone more knowledgeable could take a peek at my code and see if there's any obvious problems. No one I work with knows multithreading, so I'm really just on my own, on this one.
Here's the code. It's kinda long but should be pretty clear. Let me know if you have any feedback or advice. Thanks!
public class ItemValidationService
{
/// <summary>
/// The object to lock on in this class, for multithreading purposes.
/// </summary>
private static object locker = new object();
/// <summary>Items that have been validated.</summary>
private HashSet<int> validatedItems;
/// <summary>Items that are currently being validated.</summary>
private HashSet<int> validatingItems;
/// <summary>Remove an item from the index if its links are bad.</summary>
/// <param name="id">The ID of the item.</param>
public void ValidateItem(int id)
{
lock (locker)
{
if
(
!this.validatedItems.Contains(id) &&
!this.validatingItems.Contains(id)
){
ThreadPool.QueueUserWorkItem(sender =>
{
this.Validate(id);
});
}
}
} // method
private void Validate(int itemId)
{
lock (locker)
{
this.validatingItems.Add(itemId);
}
// *********************************************
// Time-consuming routine to validate an item...
// *********************************************
lock (locker)
{
this.validatingItems.Remove(itemId);
this.validatedItems.Add(itemId);
}
} // method
} // class
The thread pool is a convenient choice if you have light weight sporadic processing that isn't time sensitive. However, I recall reading on MSDN that it's not appropriate for large scale processing of this nature.
I used it for something quite similar to this and regret it. I took a worker-thread approach in subsequent apps and am much happier with the level of control I have.
My favorite pattern in the worker-thread model is to create a master thread which holds a queue of tasks items. Then fork a bunch of workers that pop items off that queue to process. I use a blocking queue so that when there are no items the process, the workers just block until something is pushed onto the queue. In this model, the master thread produces work items from some source (db, etc.) and the worker threads consume them.
I second the idea of using a blocking queue and worker threads. Here is a blocking queue implementation that I've used in the past with good results:
https://www.codeproject.com/Articles/8018/Bounded-Blocking-Queue-One-Lock
What's involved in your validation logic? If its mainly CPU bound then I would create no more than 1 worker thread per processor/core on the box. This will tell you the number of processors:
Environment.ProcessorCount
If your validation involves I/O such as File Access or database access then you could use a few more threads than the number of processors.
Be careful, QueueUserWorkItem might fail
There is a possible logic error in the code posted with the question, depending on where the item id in ValidateItem(int id) comes from. Why? Because although you correctly lock your validatingItems and validatedItems queues before queing a work item, you do not add the item to the validatingItems queue until the new thread spins up. That means there could be a time gap where another thread calls ValidateItem(id) with the same id (unless this is running on a single main thread).
I would add item to the validatingItems queue just before queuing the item, inside the lock.
Edit: also QueueUserWorkItem() returns a bool so you should use the return value to make sure the item was queued and THEN add it to the validatingItems queue.
ThreadPool may not be optimal for jamming so much at once into it. You may want to research the upper limits of its capabilities and/or roll your own.
Also, there is a race condition that exists in your code, if you expect no duplicate validations. The call to
this.validatingItems.Add(itemId);
needs to happen in the main thread (ValidateItem), not in the thread pool thread (Validate method). This call should occur a line before the queueing of the work item to the pool.
A worse bug is found by not checking the return of QueueUserWorkItem. Queueing can fail, and why it doesn't throw an exception is a mystery to us all. If it returns false, you need to remove the item that was added to the validatingItems list, and handle the error (throw exeception probably).
I would be concerned about performance here. You indicated that the database may give it 20-30 items per second and an item could take up to a few seconds to be validated. That could be quite a large number of threads -- using your metrics, worst case 60-90 threads! I think you need to reconsider the design here. Michael mentioned a nice pattern. The use of the queue really helps keep things under control and organized. A semaphore could also be employed to control number of threads created -- i.e. you could have a maximum number of threads allowed, but under smaller loads, you wouldn't necessarily have to create the maximum number if fewer ended up getting the job done -- i.e. your own pool size could be dynamic with a cap.
When using the thread-pool, I also find it more difficult to monitor the execution of threads from the pool in their performing the work. So, unless it's fire and forget, I am in favor of more controlled execution. I know you mentioned that your app exits after the 65K items are all completed. How are you monitoring you threads to determine if they have completed their work -- i.e. all queued workers are done. Are you monitoring the status of all items in the HashSets? I think by queuing your items up and having your own worker threads consume off that queue, you can gain more control. Albeit, this can come at the cost of more overhead in terms of signaling between threads to indicate when all items have been queued allowing them to exit.
You could also try using the CCR - Concurrency and Coordination Runtime. It's buried inside Microsoft Robotics Studio, but provides an excellent API for doing this sort of thing.
You'd just need to create a "Port" (essentially a queue), hook up a receiver (method that gets called when something is posted to it), and then post work items to it. The CCR handles the queue and the worker thread to run it on.
Here's a video on Channel9 about the CCR.
It's very high-performance and is even being used for non-Robotics stuff (Myspace.com uses it behind the scenese for their content-delivery network).
I would recommend looking into MSDN: Task Parallel Library - DataFlow. You can find examples of implementing Producer-Consumer in your case would be the database producing items to validate and the validation routine becomes the consumer.
Also recommend using ConcurrentDictionary<TKey, TValue> as a "Concurrent" hash set where you just populate the keys with no values :). You can potentially make your code lock-free.

Categories

Resources