Are they same meaning?
And How should I use ThreadPool.SetMaxThreads(20, 20); because I can not see 20 thread working asynchronously?
ThreadPool.SetMaxThreads(20, 20);
ThreadPool.QueueUserWorkItem(new WaitCallback(WorkThread), DateTime.Now);
and
for (int i = 0; i < 20; i++)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(WorkThread), DateTime.Now);
}
How many threads the pool uses is largely up to it, and may vary based on what else is going on, the number of unstarted items, etc; you are only setting the max. You aren't the only user of the pool; .net uses that itself, so don't mess with it. If you explicitly want 20 threads, create 20 Threads.
ThreadPool is implemented differently in .NET 4 (as opposed to .NET).
When you set Max ThreadPool, you are saying how many you want threadPool to create if required. If queued task is quick, it might be able to process it even with couple of them without creating 20 Threads.
You could set SetMinThreads, that will ensure that it creates minimum number of threads first. But be careful on this, as creating threads is resources hit
see MSDN article
Related
System.Threading.ThreadPool.SetMaxThreads(50, 50);
File.ReadLines().AsParallel().WithDegreeOfParallelism(100).ForAll((s)->{
/*
some code which is waiting external API call
and do not utilize CPU
*/
});
I have never got threads count more than CPU count in my system.
Can I use PLINQ and get more than one thread per CPU?
If you're calling external web API, you might be hitting the limit of concurrent simultaneous connections, which is set to 2. In the begining of your application do the following:
System.Net.ServicePointManager.DefaultConnectionLimit = 4096;
System.Net.ServicePointManager.Expect100Continue = false;
Try if that helps. If not, there might be some other bottleneck within the routine you're trying to parallelize.
Also, just like other responders said, ThreadPool decides how many threads to spin up based on load. In my experience with TPL I've seen that thread cound increases by time: longer the app runs, and heavier load gets, more threads are spun up.
PLINQ uses a hill-climbing algorithm to determine the optimum size of the thread pool which is used by the TPL. I think that if you put a lot of I/O in your tasks, seeing more threads than the cpu count is likeable.
That said, I've never seen more threads than the cpu count :) . But maybe I never had the right situation.
I tested this with the following code:
var lines = Enumerable.Range(0, 200).ToArray();
int currentThreads = 0;
int maxThreads = 0;
object l = new object();
lines.AsParallel().WithDegreeOfParallelism(100).ForAll(
s =>
{
lock (l)
{
currentThreads++;
if (currentThreads > maxThreads)
{
maxThreads = currentThreads;
Console.WriteLine(maxThreads);
}
}
Thread.Sleep(3000);
lock (l)
{
currentThreads--;
}
});
Console.WriteLine();
Console.WriteLine(maxThreads);
Basically, it records the current number of concurrently executing iterations and then saves the maximum encountered value.
The results vary quite a bit, between 15 and 25, but it's always much more than the number of CPUs my computer has (4). Increasing the sleep time increases the maximum number of concurrent threads. So it looks like the limiting factor here is the ThreadPool: it will create new threads slowly, especially when jobs are being completed relatively quickly.
If you want to increase the number of threads used, you would need to use SetMinThreads() (not SetMaxThreads()). If I set the minimum to 50, the number of threads actually used is around 60.
But having dozens of threads that do nothing but wait is quite inefficient, especially when it comes to memory consumption. You should consider using asynchronous methods instead.
PLINQ does not fit in this case.
I have found next article useful for me.
http://msdn.microsoft.com/en-us/library/hh228609(v=vs.110).aspx
Short answer: nope.
The amount of threading is simply up to the .Net Framework runtime. There is no developer control for controlling the number of threads for TPL (Task Parallel Library) usage.
EDIT
Thanks to some other feedback: it is actually possible--but not recommended--to manually control the number of threads in the ThreadPool, which PLINQ and TPL use.
It's my opinion that any parallelization problem needs to be carefully thought out, and carefully constructed and tested. There's a lot of subtlety in this.
I was writing some code to process a lot of data, and I thought it would be useful to have Parallel.ForEach create a file for each thread it creates so the output doesn't need to be synchronized (by me at least).
It looks something like this:
Parallel.ForEach(vals,
new ParallelOptions { MaxDegreeOfParallelism = 8 },
()=>GetWriter(), // returns a new BinaryWriter backed by a file with a guid name
(item, state, writer)=>
{
if(something)
{
state.Break();
return writer;
}
List<Result> results = new List<Result>();
foreach(var subItem in item.SubItems)
results.Add(ProcessItem(subItem));
if(results.Count > 0)
{
foreach(var result in results)
result.Write(writer);
}
return writer;
},
(writer)=>writer.Dispose());
What I expected to happen was that up to 8 files would be created and would persist through the entire run time. Then each would be Disposed when the entire ForEach call finishes. What really happens is that the localInit seems to be called once for each item, so I end up with hundreds of files. The writers are also getting disposed at the end of each item that is processed.
This shows the same thing happening:
var vals = Enumerable.Range(0, 10000000).ToArray();
long sum = 0;
Parallel.ForEach(vals,
new ParallelOptions { MaxDegreeOfParallelism = 8 },
() => { Console.WriteLine("init " + Thread.CurrentThread.ManagedThreadId); return 0L; },
(i, state, common) =>
{
Thread.Sleep(10);
return common + i;
},
(common) => Interlocked.Add(ref sum, common));
I see:
init 10
init 14
init 11
init 13
init 12
init 14
init 11
init 12
init 13
init 11
... // hundreds of lines over < 30 seconds
init 14
init 11
init 18
init 17
init 10
init 11
init 14
init 11
init 14
init 11
init 18
Note: if I leave out the Thread.Sleep call, it sometimes seems to function "correctly". localInit only gets called once each for the 4 threads that it decides to use on my pc. Not every time, however.
Is this the desired behavior of the function? What's going on behind the scenes that causes it to do this? And lastly, what's a good way to get my desired functionality, ThreadLocal?
This is on .NET 4.5, by the way.
Parallel.ForEach does not work as you think it does. It's important to note that the method is build on top of Task classes and that the relationship between Task and Thread is not 1:1. You can have, for example, 10 tasks that run on 2 managed threads.
Try using this line in your method body instead of the current one:
Console.WriteLine("ThreadId {0} -- TaskId {1} ",
Thread.CurrentThread.ManagedThreadId, Task.CurrentId);
You should see that the ThreadId will be reused across many different tasks, shown by their unique ids. You'll see this more if you left in, or increased, your call to Thread.Sleep.
The (very) basic idea of how the Parallel.ForEach method works, is that it takes your enumerable creates a series of tasks that will run process sections of the enumeration, the way this is done depends a lot on the input. There is also some special logic that checks for the case of a task exceeding a certain number of milliseconds without completing. If that case is true, then a new task may be spawned to help relieve the work.
If you looked at the documentation for the localinit function in Parallel.ForEach, you'll notice that it says that it returns the initial state of the local data for each _task_, not each thread.
You might ask why there are more than 8 tasks being spawned. That answer is similar to the last, found in the documentation for ParallelOptions.MaxDegreeOfParallelism.
Changing MaxDegreeOfParallelism from the default only limits how many concurrent tasks will be used.
This limit is only on the number of concurrent tasks, not a hard-limit on the number of tasks that will be created during the entire time it is processing. And as I mentioned above, there are times where a separate task will be spawned, which results in your localinit function being called multiple times and writing hundreds of files to disk.
Writing to disk is certainly a operation with a bit of latency, particularly if you're using synchronous I/O. When the disk operation happens, it blocks the entire thread; the same happens with Thread.Sleep. If a Task does this, it will block the thread it is currently running on, and no other tasks can run on it. Usually in these cases, the scheduler will spawn a new Task to help pick up the slack.
And lastly, what's a good way to get my desired functionality, ThreadLocal?
The bottom line is that thread locals don't make sense with Parallel.ForEach because you're not dealing with threads; you're dealing with tasks. A thread local could be shared between tasks because many tasks can use the same thread at the same time. Also, a task's thread local could change mid-execution, because the scheduler could preempt it from running and then continue its execution on a different thread, which would have a different thread local.
I'm not sure the best way to do it, but you could rely on the localinit function to pass in whatever resource you'd like, only allowing a resource to be used in one thread at a time. You can use the localfinally to mark it as no longer in use and thus available for another task to acquire. This is what those methods were designed for; each method is only called once per task that is spawned (see the remarks section of the Parallel.ForEach MSDN documentation).
You can also split the work yourself, and create your own set of threads and run your work. However, this is less idea, in my opinion, since the Parallel class already does this heavy lifting for you.
What you're seeing is the implementation trying to get your work done as quickly as possible.
To do this, it tries using different numbers of tasks to maximize throughput. It grabs a certain number of threads from the thread pool and runs your work for a bit. It then tries adding and removing threads to see what happens. It continues doing this until all your work is done.
The algorithm is quite dumb in that it doesn't know if your work is using a lot of CPU, or a lot of IO, or even if there is a lot of synchronization and the threads are blocking each other. All it can do is add and remove threads and measure how fast each unit of work completes.
This means it is continually calling your localInit and localFinally functions as it injects and retires threads - which is what you have found.
Unfortunately, there is no easy way to control this algorithm. Parallel.ForEach is a high-level construct that intentionally hides much of the thread-management code.
Using a ThreadLocal might help a bit, but it relies on the fact that the thread pool will reuse the same threads when Parallel.ForEach asks for new ones. This is not guarenteed - in fact, it is unlikely that the thread pool will use exactly 8 threads for the whole call. This means you will again be creating more files than necessary.
One thing that is guaranteed is that Parallel.ForEach will never use more than MaxDegreeOfParallelism threads at any one time.
You can use this to your advantage by creating a fixed-size "pool" of files that can be re-used by whichever threads are running at a particular time. You know that only MaxDegreeOfParallelism threads can run at once, so you can create that number of files before calling ForEach. Then grab one in your localInit and release it in your localFinally.
Of course, you will have to write this pool yourself and it must be thread-safe as it will be called concurrently. A simple locking strategy should be good enough, though, because threads are not injected and retired very quickly compared to the cost of a lock.
According to MSDN the localInit method is called once for each task, not for each thread:
The localInit delegate is invoked once for each task that participates in the loop's execution and returns the initial local state for each of those tasks.
localInit called when thread created.
if body takes so long it must create another thread and suspends current thread,
and if it creates another thread, it calls localInit
also when Parallel.ForEach called it creates threads as much as MaxDegreeOfParallelism value for example:
var k = Enumerable.Range(0, 1);
Parallel.ForEach(k,new ParallelOptions(){MaxDegreeOfParallelism = 4}.....
it create 4 thread when first it called
I have question on controlling the amount of concurrent threads I want running. Let me explain with what I currently do: For example
var myItems = getItems(); // is just some generic list
// cycle through the mails, picking 10 at a time
int index = 0;
int itemsToTake = myItems.Count >= 10 ? 10 : myItems.Count;
while (index < myItems.Count)
{
var itemRange = myItems.GetRange(index, itemsToTake);
AutoResetEvent[] handles = new AutoResetEvent[itemsToTake];
for (int i = 0; i < itemRange.Count; i++)
{
var item = itemRange[i];
handles[i] = new AutoResetEvent(false);
// set up the thread
ThreadPool.QueueUserWorkItem(processItems, new Item_Thread(handles[i], item));
}
// wait for all the threads to finish
WaitHandle.WaitAll(handles);
// update the index
index += itemsToTake;
// make sure that the next batch of items to get is within range
itemsToTake = (itemsToTake + index < myItems.Count) ? itemsToTake : myItems.Count -index;
This is a path that I currently take. However I do not like it at all. I know I can 'manage' the thread pool itself, but I have heard it is not advisable to do so. So what is the alternative? The semaphore class?
Thanks.
Instead of using ThreadPool directly, you might also consider using TPL or PLINQ. For example, with PLINQ you could do something like this:
getItems().AsParallel()
.WithDegreeOfParallelism(numberOfThreadsYouWant)
.ForAll(item => process(item));
or using Parallel:
var options = new ParallelOptions {MaxDegreeOfParallelism = numberOfThreadsYouWant};
Parallel.ForEach(getItems, options, item => process(item));
Make sure that specifying the degree of parallelism does actually improve performance of your application. TPL and PLINQ use ThreadPool by default, which does a very good job of managing the number of threads that are running. In .NET 4, ThreadPool implements algorithms that add more processing threads only if that improves performance.
Don't use THE treadpool, get another one (just look for google, there are half a dozen implementations out) and manage that yourself.
Managing THE treadpool is not advisable as a lot of internal workings may go ther, managing your OWN threadpool instance is totally ok.
It looks like you can control the maximum number of threads using ThreadPool.SetMaxThreads, although I haven't tested this.
Assuming the question is; "How do I limit the number of worker threads?" The the answer would be use a producer-consumer queue where you control the number of worker threads. Just queue your items and let it handle workers.
Here is a generic implementation you could use.
you can use ThreadPool.SetMaxThreads Method
http://msdn.microsoft.com/en-us/library/system.threading.threadpool.setmaxthreads.aspx
In the documentation, there is a mention of SetMaxThreads ...
public static bool SetMaxThreads (
int workerThreads,
int completionPortThreads
)
Sets the number of requests to the thread pool that can be active concurrently. All requests above that number remain queued until thread pool threads become available.
However:
You cannot set the number of worker threads or the number of I/O completion threads to a number smaller than the number of processors in the computer.
But I guess you are anyways better served by using a non-singleton thread pool.
There is no reason to deal with hybrid thread synchronization constructs (such is AutoResetEvent) and the ThreadPool.
You can use a class that can act as the coordinator responsible for executing all of your code asynchronously.
Wrap using a Task or the APM pattern what the "Item_Thread" does. Then use the AsyncCoordinator class by Jeffrey Richter (can be found at the code from the book CLR via C# 3rd Edition).
I have an application (.Net 3.5) which creates threads to write something to the database so that the GUI does not block. All created threads are added to a list, so that I can wait (Thread.Join) for each thread when the application is closed (maybe not all threads are finished when the application is closed, so the app must wait for them).
Because of the list I get some serious problems if there are too many threads created (OutOfMemoryException). I tried removing finished threads from the list, but somehow that didn't work.
Are there better ways to manage a list of threads, so I can remove them once they are finished?
Edit: It seems that fixed it (called whenever a thread is added):
lock (m_threadLock)
{
m_threads.RemoveAll(x => x.ThreadState == ThreadState.Stopped);
}
How about System.Threading.ThreadPool and SetMaxThreads plus QueueUserWorkItem?
http://msdn.microsoft.com/en-US/library/system.threading.threadpool%28v=VS.80%29.aspx
You cannot keep on creating new threads while keeping a hold on the old ones, you'll run out of memory.
I tried removing finished threads from the list, but somehow that didn't work.
That is the right path, why didn't it work?
Add code to your thread-methods to signal completion (maybe remove themselves from the list).
Look for a custom ThreadPool. There are several implementations published. You can use a simple one and control Background=false and other details.
Not sure if this is what you want, but how about something like this?
Action foo = () =>
{
Thread.Sleep(1000);
};
var handles = new List<WaitHandle>();
for (int i = 0; i < 10; i++)
{
var result = foo.BeginInvoke(r =>
{
foo.EndInvoke(r);
}, null);
handles.Add(result.AsyncWaitHandle);
}
WaitHandle.WaitAll(handles.ToArray());
OutOfMemoryException doesn't seem like the sort of thing that would be caused by the list of threads - more likely it's because of the threads themselves, ie. you are creating too many of them. You need to re-use existing ones and wait for them to become available if there are too many already. This is exactly what a thread pool does. If the built-in .NET one doesn't support waiting for the threads then you'll just have to find a third-party implementation or, worst come to worst, write your own - possibly using the built-in one as a guide.
Use a more advanced ThreadPool, like this one: http://www.codeproject.com/KB/threads/smartthreadpool.aspx . It allows you to cancel work items or wait for all work items to complete.
Alright...I've given the site a fair search and have read over many posts about this topic. I found this question: Code for a simple thread pool in C# especially helpful.
However, as it always seems, what I need varies slightly.
I have looked over the MSDN example and adapted it to my needs somewhat. The example I refer to is here: http://msdn.microsoft.com/en-us/library/3dasc8as(VS.80,printer).aspx
My issue is this. I have a fairly simple set of code that loads a web page via the HttpWebRequest and WebResponse classes and reads the results via a Stream. I fire off this method in a thread as it will need to executed many times. The method itself is pretty short, but the number of times it needs to be fired (with varied data for each time) varies. It can be anywhere from 1 to 200.
Everything I've read seems to indicate the ThreadPool class being the prime candidate. Here is what things get tricky. I might need to fire off this thing say 100 times, but I can only have 3 threads at most running (for this particular task).
I've tried setting the MaxThreads on the ThreadPool via:
ThreadPool.SetMaxThreads(3, 3);
I'm not entirely convinced this approach is working. Furthermore, I don't want to clobber other web sites or programs running on the system this will be running on. So, by limiting the # of threads on the ThreadPool, can I be certain that this pertains to my code and my threads only?
The MSDN example uses the event drive approach and calls WaitHandle.WaitAll(doneEvents); which is how I'm doing this.
So the heart of my question is, how does one ensure or specify a maximum number of threads that can be run for their code, but have the code keep running more threads as the previous ones finish up until some arbitrary point? Am I tackling this the right way?
Sincerely,
Jason
Okay, I've added a semaphore approach and completely removed the ThreadPool code. It seems simple enough. I got my info from: http://www.albahari.com/threading/part2.aspx
It's this example that showed me how:
[text below here is a copy/paste from the site]
A Semaphore with a capacity of one is similar to a Mutex or lock, except that the Semaphore has no "owner" – it's thread-agnostic. Any thread can call Release on a Semaphore, while with Mutex and lock, only the thread that obtained the resource can release it.
In this following example, ten threads execute a loop with a Sleep statement in the middle. A Semaphore ensures that not more than three threads can execute that Sleep statement at once:
class SemaphoreTest
{
static Semaphore s = new Semaphore(3, 3); // Available=3; Capacity=3
static void Main()
{
for (int i = 0; i < 10; i++)
new Thread(Go).Start();
}
static void Go()
{
while (true)
{
s.WaitOne();
Thread.Sleep(100); // Only 3 threads can get here at once
s.Release();
}
}
}
Note: if you are limiting this to "3" just so you don't overwhelm the machine running your app, I'd make sure this is a problem first. The threadpool is supposed to manage this for you. On the other hand, if you don't want to overwhelm some other resource, then read on!
You can't manage the size of the threadpool (or really much of anything about it).
In this case, I'd use a semaphore to manage access to your resource. In your case, your resource is running the web scrape, or calculating some report, etc.
To do this, in your static class, create a semaphore object:
System.Threading.Semaphore S = new System.Threading.Semaphore(3, 3);
Then, in each thread, you do this:
System.Threading.Semaphore S = new System.Threading.Semaphore(3, 3);
try
{
// wait your turn (decrement)
S.WaitOne();
// do your thing
}
finally {
// release so others can go (increment)
S.Release();
}
Each thread will block on the S.WaitOne() until it is given the signal to proceed. Once S has been decremented 3 times, all threads will block until one of them increments the counter.
This solution isn't perfect.
If you want something a little cleaner, and more efficient, I'd recommend going with a BlockingQueue approach wherein you enqueue the work you want performed into a global Blocking Queue object.
Meanwhile, you have three threads (which you created--not in the threadpool), popping work out of the queue to perform. This isn't that tricky to setup and is very fast and simple.
Examples:
Best threading queue example / best practice
Best method to get objects from a BlockingQueue in a concurrent program?
It's a static class like any other, which means that anything you do with it affects every other thread in the current process. It doesn't affect other processes.
I consider this one of the larger design flaws in .NET, however. Who came up with the brilliant idea of making the thread pool static? As your example shows, we often want a thread pool dedicated to our task, without having it interfere with unrelated tasks elsewhere in the system.