I have an application (.Net 3.5) which creates threads to write something to the database so that the GUI does not block. All created threads are added to a list, so that I can wait (Thread.Join) for each thread when the application is closed (maybe not all threads are finished when the application is closed, so the app must wait for them).
Because of the list I get some serious problems if there are too many threads created (OutOfMemoryException). I tried removing finished threads from the list, but somehow that didn't work.
Are there better ways to manage a list of threads, so I can remove them once they are finished?
Edit: It seems that fixed it (called whenever a thread is added):
lock (m_threadLock)
{
m_threads.RemoveAll(x => x.ThreadState == ThreadState.Stopped);
}
How about System.Threading.ThreadPool and SetMaxThreads plus QueueUserWorkItem?
http://msdn.microsoft.com/en-US/library/system.threading.threadpool%28v=VS.80%29.aspx
You cannot keep on creating new threads while keeping a hold on the old ones, you'll run out of memory.
I tried removing finished threads from the list, but somehow that didn't work.
That is the right path, why didn't it work?
Add code to your thread-methods to signal completion (maybe remove themselves from the list).
Look for a custom ThreadPool. There are several implementations published. You can use a simple one and control Background=false and other details.
Not sure if this is what you want, but how about something like this?
Action foo = () =>
{
Thread.Sleep(1000);
};
var handles = new List<WaitHandle>();
for (int i = 0; i < 10; i++)
{
var result = foo.BeginInvoke(r =>
{
foo.EndInvoke(r);
}, null);
handles.Add(result.AsyncWaitHandle);
}
WaitHandle.WaitAll(handles.ToArray());
OutOfMemoryException doesn't seem like the sort of thing that would be caused by the list of threads - more likely it's because of the threads themselves, ie. you are creating too many of them. You need to re-use existing ones and wait for them to become available if there are too many already. This is exactly what a thread pool does. If the built-in .NET one doesn't support waiting for the threads then you'll just have to find a third-party implementation or, worst come to worst, write your own - possibly using the built-in one as a guide.
Use a more advanced ThreadPool, like this one: http://www.codeproject.com/KB/threads/smartthreadpool.aspx . It allows you to cancel work items or wait for all work items to complete.
Related
thanks for the assistance. I've got a triple-threaded process, linked by a concurrent queue. Thread one processes information, returns to the second thread, which places data into a concurrent queue. The third thread is just looping like so:
while (true) {
if(queue.TryDequeue(out info)) {
doStuff(info);
} else {
Thread.Sleep(1);
}
}
Is there a better way to handle it such that I'm not iterating over the loop so much? The application is extremely performance sensitive, and currently just the TryDequeue is taking ~8-9% of the application runtime. Looking to decrease that as much as possible, but not really sure what my options are.
You should consider using System.Collections.Concurrent.BlockingCollection and its Add() / Take() methods. With Take() your third thread will be just suspended while waiting for new item. Add() is thread safe and can be used by second thread.
With that approach you should be able to simplify your code into something like that:
while (true) {
var info = collection.Take();
doStuff(info);
}
You can increase the sleep time. I would also use await Task.Delay instead of sleep. This way you can wait longer without the extra cpu cycles that Thread.Sleep uses and still be able to cancel the delay by making use of the CancellationTokenSource.
On another note, there are better ways of queuing up jobs. Taking into consideration that it appears you want to run these jobs synchronously, an example would be to have a singleton class that takes your work items and queues them up. So if there are no items in the queue when you add one, it should detect that and then start your job process. At the end of your job process, check for more work, use recursion to do that work or if no more jobs then exit the job process, which will run again when you add an item to the empty queue. If my assumption is wrong and you can run these jobs in parallel, why use a queue?
You may like to use a thread safe implementation of ObservableCollection. Check out this SO question ObservableCollection and threading
I don't have a recommendation that avoids looping, however I would recommend you move away from
while (true)
and consider this instead:
MyThing thing;
while (queue.TryDequeue(out thing))
{
doWork(thing);
}
Put this in a method that gets called each time the queue is modified, this ensures it is running when needed, but ends when not needed.
What I'm trying to accomplish is I have a action block with MaxDegreeOfParallelism = 4. I want to create one local instance of a session object I have for each parallel path, So I want to total of 4 session objects. If this was threads I would creating something like:
ThreadLocal<Session> sessionPerThread = new ThreadLocal<Session>(() => new Session());
I know blocks are not threads so I'm looking for something similar but for blocks. Any way to create this?
This block is in a service and runs for months on end. During that time period tons of threads are used for each concurrent slot of the block so thread local storage is not appropriate. I need something tied to the logical block slot. Also this block never completes, it runs the entire lifetime of the service.
Note: The above suggested answer is not valid for what I am asking. I'm specifically asking for something different than thread local and the above answer is using thread local. This is a different question entirely.
As it sounds like you already know, Dataflow blocks provide absolutely no guarantee of correlation between blocks, execution, and threads. Even with max parallelism set to 4, all 4 tasks could be executing on the same thread. Or an individual task may execute on many threads.
Given that you ultimately want to reuse n instances of an expensive service for your n degrees of parallelism, let's take dataflow completely out of the picture for a minute, since it doesn't help (or directly hinder) you from any general solution to this problem. It's actually quite simple. You can use a ConcurrentStack<T>, where T is the type of your service that is expensive to instantiate. You have code that appears at the top of the method (or delegate) that represents one of your parallel units of work:
private ConcurrentStack<T> reusableServices;
private void DoWork() {
T service;
if (!this.reusableServices.TryPop(out service)) {
service = new T(); // expensive construction
}
// Use your shared service.
//// Code here.
// Put the service back when we're done with it so someone else can use it.
this.reusableServices.Push(service);
}
Now in this way, you can quickly see that you create exactly as many instances of your expensive service as you have parallel executions of DoWork(). You don't even have to hard-code the degree of parallelism you expect. And it's orthogonal to how you actually schedule that parallelism (so threadpool, Dataflow, PLINQ, etc. doesn't matter).
So you can just use DoWork() as your Dataflow block's delegate and you're set to go.
Of course, there's nothing magical about ConcurrentStack<T> here, except that the locks around push and pop are built into the type so you don't have to do it yourself.
I have a loop that looks something like this:
var list = new List<float>();
while (list.Count < wantedNumberOfJobs)
{
var resource = ... //gets the resource
lock (resource)
{
ThreadPool.QueueUserWorkItem(DoWork, /*wrap resource and list into an object*/);
}
}
//continue pass this point when all the threads are finished
And the work method:
private void DoWork(object state)
{
var list = (/*wrapperObject*/)state.List;
var someFloat = //do the work to get the number
lock (list)
{
list.Add(someFloat);
}
}
Essentially I want to do a large, but a specific (given by wantedNumberOfJobs) number of jobs done. Each of these jobs inserts a single item into the list as you can see in the DoWork method.
I am not certain that this code is assuring me that list will contain wantedNumberOfJobs items past the designated point. I'd also like to limit the number of active threads. I've used the System.Threading.Semaphore class, but am not sure this is the best solution.
I'd appreciate any help. Thanks!
Perhaps you can use Parallel.For, like so:
Parallel.For(0, wantedNumberOfJobs, i => {
var resource = ... //gets the resource
DoWork(Resource);
});
You should really get a copy of Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the .NET Framework 4.
It explains who to use the parallel extensions in .net 4 and how you can influence how many jobs are running concurrently. For your list you should also take a look into the Concurrent namespace to find a thread-safe alternative.
Since you tagged this C#4, consider using Tasks from the TaskPool rather than Threads from the TreadPool. This will limit the number of threads you use based on the number of processors you have. Also, consider removing some of the locks as each of them require context switching and process locking which effectively limit or even eliminate the advantage of using multiple threads in the first place.
You'll likely end up with many more than wantedNumberOfJobs items in list. This is because the while loop is deciding whether or not to queue a new work item based on the current contents of list. Depending on how long DoWork takes, it could queue hundreds or thousands of items before any are added to list.
One way to work around this problem is to have the while loop keep track of the number of work items it queues up, and to stop when it gets to wantedNumberOfJobs.
PLINQ may be sufficient for you? http://msdn.microsoft.com/en-us/library/dd460714.aspx
I have several actions that I want to execute in the background, but they have to be executed synchronously one after the other.
I was wondering if it's a good idea to use the Task.ContinueWith method to achieve this. Do you foresee any problems with this?
My code looks something like this:
private object syncRoot =new object();
private Task latestTask;
public void EnqueueAction(System.Action action)
{
lock (syncRoot)
{
if (latestTask == null)
latestTask = Task.Factory.StartNew(action);
else
latestTask = latestTask.ContinueWith(tsk => action());
}
}
There is one flaw with this, which I recently discovered myself because I am also using this method of ensuring tasks execute sequentially.
In my application I had thousands of instances of these mini-queues and quickly discovered I was having memory issues. Since these queues were often idle I was holding onto the last completed task object for a long time and preventing garbage collection. Since the result object of the last completed task was often over 85,000 bytes it was allocated to Large Object Heap (which does not perform compaction during garbage collection). This resulted in fragmentation of the LOH and the process continuously growing in size.
As a hack to avoid this, you can schedule a no-op task right after the real one within your lock. For a real solution, I will need to move to a different method of controlling the scheduling.
This should work as designed (using the fact that TPL will schedule the continuation immediately if the corresponding task already has completed).
Personally in this case I would just use a dedicated thread using a concurrent queue (ConcurrentQueue) to draw tasks from - this is more explicit but easier to parse reading the code, especially if you want to find out i.e. how many tasks are currently queued etc.
I used this snippet and have seem to get it work as designed.
The number of instances in my case does not runs in to thousands, but in single digit.
Nevertheless, no issues so far.
I would be interested in the ConcurrentQueue example, if there is any?
Thanks
Alright...I've given the site a fair search and have read over many posts about this topic. I found this question: Code for a simple thread pool in C# especially helpful.
However, as it always seems, what I need varies slightly.
I have looked over the MSDN example and adapted it to my needs somewhat. The example I refer to is here: http://msdn.microsoft.com/en-us/library/3dasc8as(VS.80,printer).aspx
My issue is this. I have a fairly simple set of code that loads a web page via the HttpWebRequest and WebResponse classes and reads the results via a Stream. I fire off this method in a thread as it will need to executed many times. The method itself is pretty short, but the number of times it needs to be fired (with varied data for each time) varies. It can be anywhere from 1 to 200.
Everything I've read seems to indicate the ThreadPool class being the prime candidate. Here is what things get tricky. I might need to fire off this thing say 100 times, but I can only have 3 threads at most running (for this particular task).
I've tried setting the MaxThreads on the ThreadPool via:
ThreadPool.SetMaxThreads(3, 3);
I'm not entirely convinced this approach is working. Furthermore, I don't want to clobber other web sites or programs running on the system this will be running on. So, by limiting the # of threads on the ThreadPool, can I be certain that this pertains to my code and my threads only?
The MSDN example uses the event drive approach and calls WaitHandle.WaitAll(doneEvents); which is how I'm doing this.
So the heart of my question is, how does one ensure or specify a maximum number of threads that can be run for their code, but have the code keep running more threads as the previous ones finish up until some arbitrary point? Am I tackling this the right way?
Sincerely,
Jason
Okay, I've added a semaphore approach and completely removed the ThreadPool code. It seems simple enough. I got my info from: http://www.albahari.com/threading/part2.aspx
It's this example that showed me how:
[text below here is a copy/paste from the site]
A Semaphore with a capacity of one is similar to a Mutex or lock, except that the Semaphore has no "owner" – it's thread-agnostic. Any thread can call Release on a Semaphore, while with Mutex and lock, only the thread that obtained the resource can release it.
In this following example, ten threads execute a loop with a Sleep statement in the middle. A Semaphore ensures that not more than three threads can execute that Sleep statement at once:
class SemaphoreTest
{
static Semaphore s = new Semaphore(3, 3); // Available=3; Capacity=3
static void Main()
{
for (int i = 0; i < 10; i++)
new Thread(Go).Start();
}
static void Go()
{
while (true)
{
s.WaitOne();
Thread.Sleep(100); // Only 3 threads can get here at once
s.Release();
}
}
}
Note: if you are limiting this to "3" just so you don't overwhelm the machine running your app, I'd make sure this is a problem first. The threadpool is supposed to manage this for you. On the other hand, if you don't want to overwhelm some other resource, then read on!
You can't manage the size of the threadpool (or really much of anything about it).
In this case, I'd use a semaphore to manage access to your resource. In your case, your resource is running the web scrape, or calculating some report, etc.
To do this, in your static class, create a semaphore object:
System.Threading.Semaphore S = new System.Threading.Semaphore(3, 3);
Then, in each thread, you do this:
System.Threading.Semaphore S = new System.Threading.Semaphore(3, 3);
try
{
// wait your turn (decrement)
S.WaitOne();
// do your thing
}
finally {
// release so others can go (increment)
S.Release();
}
Each thread will block on the S.WaitOne() until it is given the signal to proceed. Once S has been decremented 3 times, all threads will block until one of them increments the counter.
This solution isn't perfect.
If you want something a little cleaner, and more efficient, I'd recommend going with a BlockingQueue approach wherein you enqueue the work you want performed into a global Blocking Queue object.
Meanwhile, you have three threads (which you created--not in the threadpool), popping work out of the queue to perform. This isn't that tricky to setup and is very fast and simple.
Examples:
Best threading queue example / best practice
Best method to get objects from a BlockingQueue in a concurrent program?
It's a static class like any other, which means that anything you do with it affects every other thread in the current process. It doesn't affect other processes.
I consider this one of the larger design flaws in .NET, however. Who came up with the brilliant idea of making the thread pool static? As your example shows, we often want a thread pool dedicated to our task, without having it interfere with unrelated tasks elsewhere in the system.