So I loop that will loop about 20000 times. Each time it loops, I create a new thread. The thread basically calls one method. The method is rather slow it takes four seconds to complete. It goes out and scrapes some page(which we have permission to scrape). I add a one second delay in the loop which would make sure only 4 pages are being scrapped at once. My question what happens to that thread once the method is completed?
Thread.Sleep(1000);
Thread t = new Thread(() => scraping(asin.Trim(), sku.Trim()));
t.Start();
My question what happens to that thread once the method is completed?
It will get destroyed as soon as the method completes.
That being said, this is not an ideal approach. You should use the ThreadPool to avoid creating many threads.
Instead of using new Thread, consider using ThreadPool.QueueUserWorkItem to start off the task.
In addition, if you're using .NET 4, you can use Parallel.ForEach to loop through your entire collection concurrently. This will use the ThreadPool (by default) to schedule all of your tasks.
Finally, you probably should eliminate the Thread.Sleep in your loop - it will just slow down the overall operation, and probably not provide you any gains (once you've switched to using the ThreadPool).
It exits and gets destroyed.
Maybe you'd be more interested in using a ThreadPool instead and set their maximum thread count to 4?
It will reuse the same threads for doing your task, as constructing new threads involves allocating new memory for their stacks, especially if you're doing that 20000 times, it might be one of your bottlenecks.
It gets collected and discarded by the operating system. You might be better off with ThreadPool.QueueUserWorkItem. This will re-use threads so you don't have to incur the setup/teardown cost 20000 times.
Based on your edited post, the thread will be destroyed and its resources garbage collected. As #Karim has also mentioned.
If you were using a ThreadPool it would be returned to the pool. If you know exactly how many threads you plan to keep active at any given time you could create a pool with that number to save some overhead.
Related
What feedback does TPL use to dynamically adjust the number of worker threads?
My previous understanding was that it measures the rate of task completion to see if adding or removing threads is worth it. But then, why does this code keep increasing the number of threads, even though there is a bottleneck introduced by a semaphore?
Surely, there can be no more than 20 task completions per second, and more than 2 threads will not improve that.
var activeThreads = 0;
var semaphore = new SemaphoreSlim(2);
var res = Parallel.For(0, 1_000_000, i =>
{
Interlocked.Increment(ref activeThreads);
semaphore.Wait();
try
{
Thread.Sleep(100);
Console.WriteLine("Threads: " + activeThreads);
}
finally
{
Interlocked.Decrement(ref activeThreads);
semaphore.Release();
}
});
I believe the ParallelOptions is what you are looking for to specify the amount of parallelism.
Parallel.For(0, 1000, new ParallelOptions
{
MaxDegreeOfParallelism = 2
}, i => { Console.WriteLine(i); });
Personally, I think the TPL library will work in a lot of cases, but it isn't really smart about execution distribution (pardon my english). Whenever you have bottlenecks in the execution of your application, have a look at the pipeline pattern for example. Here is a link that describes different approaches to parallel execution very well imo: https://www.dotnetcurry.com/patterns-practices/1407/producer-consumer-pattern-dotnet-csharp
TL;DR: The thing that you are doing in your code that the TPL uses to justify creating a new thread is blocking. (Synchronizing or sleeping, or performing I/O would all count as blocking.)
A longer explanation...
When your task runs, it takes its thread hostage for 100 ms (because you Sleep(100)). While you are sleeping, that thread cannot be used to run other tasks because it would risk not being in a runnable state when the sleep time period expires. Typically we sleep rather than perform an asynchronous action because we need to keep our call stack intact. We are therefore relying on the stack to maintain our state. And the stack is a one-of-a-kind resource for the thread. (There's not actually a lot more to a thread than its stack.)
So the TPL (Thread pool, specifically) tries to keep occupancy high but the thread count low. One way it achieves this is by making sure that there are approximately just as many runnable threads in the system as there are virtual processors. Each time it needs to increase the thread count, it must create a relatively expensive stack for the thread, so it's best not to have so many. And a thread that is not runnable cannot be scheduled, so when the CPU becomes free, you need something to schedule to make use of the processing resources available. If the thread is sleeping, it cannot be scheduled to run. So instead, a thread will be added to the thread pool and the next task will be scheduled on it.
When you are writing parallel code like this (as in your parallel for loop) that can be partitioned and managed by the TPL you should be careful about putting your thread into a non-runnable state. Performing synchronous I/O, waiting for a synchronization object (e.g. semaphore, event or mutex etc.), or sleeping will put the thread into a state where nothing else can be done on the thread until the I/O completes, the sleep interval expires, or the synchronization object becomes signalled. The thread is no good to the TPL during this period.
In your case, you do several of these things: you wait on a semaphore, you sleep, and you perform I/O by writing to the console. The first thing is waiting on that semaphore. If it's not signalled, then you immediately have the situation where the thread is not runnable and the next task of your million-or-so tasks that need to be run must be scheduled on a different thread. If there isn't one, then the TPL can justify creating a new thread to get more tasks started. After-all, what if it's thread #987,321 that will actually wind up setting the semaphore to unblock task #1? The TPL doesn't know what your code does, so it can delay creating threads for a while in the spirit of efficiency, but for correctness, ultimately it will have to create more threads to start chipping away at the task list. There is a complex, implementation-specific heuristic that it applies to monitor, predict and otherwise get this efficiency guess right.
Now your specific question actually asked what feedback does it use to adjust the number of threads. Like I said, the actual implementation is complex and you should probably think of it as a black-box. But in a nutshell, if there are no runnable threads, it may create another thread to keep chipping away at the task list (or may wait a while before doing so, hoping that things will free up), and if there are too many idle threads, it will terminate the idle threads to reclaim their resources.
And to reiterate, as I said at the top, and to hopefully answer your question this time, the one thing you do that allows the TPL to justify creating a new thread is to block. ...even on that first semaphore.
Ran into an article analysing the thread injection algorithm in 2017. As of 2019-08-01, the hillclimbing.cpp file on GitHub hasn't really changed so the article should still be up to date.
Relevant details:
The .NET thread pool has two main mechanisms for injecting threads: a
starvation-avoidance mechanism that adds worker threads if it sees no
progress being made on queued items and a hill-climbing heuristic that
tries to maximize throughput while using as few threads as possible.
...
It calculates the desired number of threads based on the ‘current
throughput’, which is the ‘# of tasks completed’ (numCompletions)
during the current time-period (sampleDuration in seconds).
...
It also takes the current thread count (currentThreadCount) into
consideration.
...
The real .NET Thread Pool only increases the thread-count by one
thread every 500 milliseconds. It keeps doing this until the ‘# of
threads’ has reached the amount that the hill-climbing algorithm
suggests.
...
The [hill-climbing] algorithm only returns values that respect the limits
specified by ThreadPool.SetMinThreads(..) and
ThreadPool.SetMaxThreads(..)
...
In addition, [the hill-climbing algorithm] will only recommend
increasing the thread count if the CPU Utilization is below 95%
So it turns out the thread pool does have a feedback mechanism based on task completion rate. It also doesn't explicitly check whether its threads are blocked or running, but it does keep an eye on overall CPU utilization to detect blockages. All this also means it should be roughly aware of what the other threads and processes are doing.
On the other hand, it will always eagerly spawn at least as many threads as told by ThreadPool.SetMinThreads(), which defaults to the number of logical processors on the machine.
In conclusion, the test code in question was doing two things which make it keep piling up more threads:
there are lots of tasks queued up and sitting in the queue for ages, which indicates starvation
CPU utilization is negligible, which means that a new thread should be able to use it
I've a c# single threaded application and currently working on to make it multi-threaded with the use of thread pools. I am stuck in deciding which model would work for my problem.
Here's my current scenario
While(1)
{
do_sometask();
wait(time);
}
And this is repeated almost forever. The new scenario has multiple threads which does the above. I could easily implement it by spawning number of threads based on the tasks I have to perform, where all the threads perform some task and wait forever.
The issue here is I may not know the number of tasks, so I can't just blindly spawn 500 threads. I thought about using threadpool, but because almost every thread loops forever and won't ever be freed up for new tasks in the queue, am not sure which other model to use.
I am looking for an idea or solution where I could break the loop in the thread and free it up instead of waiting, but come back and resume the same task after the wait(when the time gets elapsed, using something like a timer/checking timestamp of when the last task is performed).
With this I could use a limited number of threads (like in a thread pool) and serve the tasks which are coming in during the time old threads waits(virtually).
Any help is really appreciated.
If all you have is a bunch of things that happen periodically, it sounds what you want is a bunch of timers. Create a timer for each task, to fire when appropriate. So if you have two different tasks:
using System.Threading;
// Task1 happens once per minute
Timer task1Timer = new Timer(
s => DoTask1(),
null,
TimeSpan.FromMinutes(1),
TimeSpan.FromMinutes(1));
// Task2 happens once every 47 seconds
Timer task2Timer = new Timer(
s => DoTask2(),
null,
TimeSpan.FromSeconds(47),
TimeSpan.FromSeconds(47);
The timer is a pretty lightweight object, so having a whole bunch of them isn't really a problem. The timer only takes CPU resources when it fires. The callback method will be executed on a pool thread.
There is one potential problem. If you have a whole lot of timers all with the same period, then the callbacks will all be called at the same time. The threadpool should handle that gracefully by limiting the number of concurrent tasks, but I can't say for sure. But if your wait times are staggered, this is going to work well.
If you have small wait times (less than a second), then you probably need a different technique. I'll detail that if required.
With this design, you only have one thread blocked at any time.
Have one thread (the master thread) waiting on a concurrent blocking collection, such as the BlockingCollection. This thread will be blocked by a call to TryTake until something is placed in the collection, or after a certain amount of time has passed via a timeout passed into the call (more on this later).
Once it is unblocked, it may have a unit of work to be processed. It checks to see if there is one (i.e., the TryTake call didn't time out), then if there is capacity to perform this work, and if so, queues up a thread (pool, Task or whatevs) to service the work. This master thread then goes back to the blocking collection and tries to take another unit of work. The cycle continues.
As a unit of work is begun, it will be noted so that the main thread can see how many threads are working. Once this unit is completed, the notation will be removed. The thread is then freed.
You want to use a timeout so that if it is judged that too many operations are running concurrently, you will be able to re-evaluate this a set period of time down the road. Otherwise, that unit of work sits in the blocking collection until a new unit is added, which is not optimal.
Outside users of this instance can queue up new units of work by simply dropping them in the collection.
You can use a cancellation token to immediately unblock the thread when it's time to shut down operations. Have the worker operations take cancellation tokens as well so they can halt on shutdown.
I could implement it with the help of a threadpool and few conditions to check the last activity of the task before adding it to the threadpool queue.
I am trying to speed things by having one thread write to a linked list and another thread process the linked list.
For some reason if the method that writes to the linked list I make it into a task and the method that reads from the linked list a low priority thread the program finishes as a whole much faster. In other words I experiense fastests results when doing:
Task.Factory.StartNew( AddItems );
new Thread( startProcessingItems ) { Priority = ThreadPriority.Lowest }.Start();
while(completed==false)
Thread.Sleep(0);
Maybe because the first task is doing so much extra work than the other thread that's why everything as a whole will finish faster if I set the second method a low priority.
Anyways now my question is The startProcessingItems runs with ThreadPriority = Lowest. How could I change it's priority to highest? If I create a new Task in that method will it be running with low priority? Basically the startProcessingItems ends with a list and once it has that list I will like to start executing with highest priority.
This is not a good approach. First off, LinkedList<T> is not thread safe, so writing to it and reading from it in two threads will cause race conditions.
A better approach would be to use BlockingCollection<T>. This allows you to add items (producer thread) and read items (consumer thread) without worrying about thread safety, as it's completely thread safe.
The reading thread can just call blockingCollection.GetConsumingEnumerable() in a foreach to get the elements, and the write thread just adds them. The reading thread will automatically block, so there's no need to mess with priorities.
When the writing thread is "done", you just call CompleteAdding, which will, in turn, allow the reading thread to finish automatically.
You can improve the performance of your program by changing the inherent design, rather than by changing thread/process priorities.
A big part of your problem is that you're doing a busywait:
while(completed==false)
Thread.Sleep(0);
This results in it consuming lots of CPU cycles for no productive work, and it's why lowering it's priority makes it execute quicker. If you aren't busywaiting then that won't be an issue any more.
As Reed has suggested, BlockingCollection is tailor made for this situation. You can have a producer thread adding items using Add, and a consumer thread using Take knowing that the method will simply block if there are no more items to be removed.
You can also store the Task you create and use Task.Result or Task.Wait to have the main thread wait for the other task to finish (without wasting CPU cycles). (If you are using threads directly you can use Join.)
In addition to what Reed and Servy have said:
Thread priority is relative to the processes priority.
The Windows scheduler takes into account all other threads when it schedules time for threads. Threads with higher priority take time away from other threads which could artificially slow down the rest of the system. It's not like the system has no reason not to give you thread more priority. The priority would only have an effect if something else took the CPU away from it--which happens for a reason. If nothing took the CPU away from the thread, it won't magically run faster with a priority of Highest.
Setting thread priority to Highest is almost always the wrong thing to do.
Likely the overhead of synchronization between the two threads will kill any performance gain you think you might get.
Also, Thread.Sleep(0) only relinquishes time to threads of equal priority and is ready to run--which could lead to thread starvation. http://msdn.microsoft.com/en-us/library/d00bd51t(v=vs.80).aspx
When I am using the .Net 4 Task class which uses the ThreadPool, what happens if all Threads are busy?
Does the TaskScheduler create a new thread and extend the ThreadPool maximum number of threads or does it sit and wait until a thread is available?
The maximum number of threads in the ThreadPool is set to around 1000 threads on a .NET4.0 32-bit system. It's less for older versions of .NET. If you have 1000 threads going, say they're blocking for some reason, when you queue the 1001st task, it won't ever execute.
You will never hit the max thread count in a 32 bit process. Keep in mind that each thread takes at least 1MB of memory (that's the size of the user-mode stack), plus any other overhead. You already lose a lot of memory from the CLR and native DLLs loaded, so you'll hit an OutOfMemoryException before you use that many threads.
You can change the number of threads the ThreadPool can use, by calling the ThreadPool.SetMaxThreads method. However, if you're expecting to use that many threads, you have a far larger problem with your code. I do NOT recommend you mess around with ThreadPool configurations like that. You'll most likely just get worse performance.
Keep in mind with Task and ThreadPool.QueueUserWorkItem, the threads are re-used when they're done. If you create a task or queue a threadpool thread, it may or may not create a new thread to execute your code. If there are available threads already in the pool, it will re-use one of those instead of creating (an expensive) new thread. Only if the methods you're executing in the tasks never return, should you worry about running out of threads, but like I said, that's an entirely different problem with your code.
By default, the MaxThreads of the ThreadPool is very high. Usually you'll never get there, your app will crash first.
So when all threads are busy the new tasks are queued and slowly, at most 1 per 500 ms, the TP will allocate new threads.
It won't increase MaxThreads. When there are more tasks than available worker threads, some tasks will be queued and wait until the thread pool provides an available thread. It does some pretty advanced stuff to scale with a large number of cores (work-stealing, thread injection, etc).
I am cleaning up some old code converting it to work asynchronously.
psDelegate.GetStops decStops = psLoadRetrieve.GetLoadStopsByLoadID;
var arStops = decStops.BeginInvoke(loadID, null, null);
WaitHandle.WaitAll(new WaitHandle[] { arStops.AsyncWaitHandle });
var stops = decStops.EndInvoke(arStops);
Above is a single example of what I am doing for asynchronous work. My plan is to have close to 20 different delegates running. All will call BeginInvoke and wait until they are all complete before calling EndInvoke.
My question is will having so many delegates running cause problems? I understand that BeginInvoke uses the ThreadPool to do work and that has a limit of 25 threads. 20 is under that limit but it is very likely that other parts of the system could be using any number of threads from the ThreadPool as well.
Thanks!
No, the ThreadPool manager was designed to deal with this situation. It won't let all thread pool threads run at the same time. It starts off allowing as many threads to run as you have CPU cores. As soon as one completes, it allows another one to run.
Every half second, it steps in if the active threads are not completing. It assumes they are stuck and allows another one to run. On a 2 core CPU, you'd now have 3 threads running.
Getting to the maximum, 500 threads on a 2 core CPU, would take quite a while. You would have to have threads that don't complete for over 4 minutes. If your threads behave that way then you don't want to use threadpool threads.
The current default MaxThreads is 250 per processor, so effectively there should be no limit unless your application is just spewing out calls to BeginInvoke. See http://msdn.microsoft.com/en-us/library/system.threading.threadpool.aspx. Also, the thread pool will try to reuse existing threads before creating new ones in order to reduce the overhead of creating new threads. If you invokes are all fast you will probably not see the thread pool create a lot of threads.
For long running tasks or blocking tasks it is usually better to avoid the thread pool and managed the threads yourself.
However, trying to schedule that many threads will probably not yield the best results on most current machines.