Parallel.ForEach on a BlockingCollection causes steady increase of threads - c#

I observed a strange behavior while experimenting with the Parallel.ForEach method and the BlockingCollection<T> class. Apparently calling the two lines below on a separate thread, is enough to cause an ever increasing number of threads in the ThreadPool:
var queue = new BlockingCollection<int>();
Parallel.ForEach(queue.GetConsumingEnumerable(), _ => { });
The queue contains no elements. I was expecting that the Parallel.ForEach loop would be idle, waiting for items to be added in the queue. Apparently it's not idle, because the ThreadPool.ThreadCount is increasing by one every second. Here is a minimal demo:
public class Program
{
public static void Main()
{
new Thread(() =>
{
var queue = new BlockingCollection<int>();
Parallel.ForEach(queue.GetConsumingEnumerable(), _ => { });
})
{ IsBackground = true }.Start();
Stopwatch stopwatch = Stopwatch.StartNew();
while (true)
{
Console.WriteLine($"ThreadCount: {ThreadPool.ThreadCount}");
if (stopwatch.ElapsedMilliseconds > 8000) break;
Thread.Sleep(1000);
}
Console.WriteLine("Finished");
}
}
Output:
ThreadCount: 0
ThreadCount: 4
ThreadCount: 5
ThreadCount: 6
ThreadCount: 7
ThreadCount: 8
ThreadCount: 10
ThreadCount: 11
ThreadCount: 12
Finished
Online demo.
Can anyone explain why is this happening, and how to prevent it from happening? Ideally I would like the Parallel.ForEach to consume at most one ThreadPool thread, while the queue is empty.
I am searching for a solution applicable on .NET Core 3.1 and later.

Reason
The Parallel.ForEach method partitions the enumerable first, then it uses TaskReplicator.Run to run the action on each partition.
The actual runner is the method Replica.Execute. In the beginning of the method, because the enumerable is blocked, the replica won't stop, it will create its replica until _remainingConcurrency is 0.
The _remainingConcurrency field is initialed from the ParallelOptions.EffectiveMaxConcurrencyLevel option, its default value is -1, which means unlimited. So you saw the ThreadCount kept increasing.
Solution
Because we know the reason, we just need to specific a reasonable option to limit the iteration count.
var option = new ParallelOptions
{
MaxDegreeOfParallelism = 5
};
Parallel.ForEach(queue.GetConsumingEnumerable(), option, _ => { });

Related

C# Parallel Execution Thread Limit

I'm writing a simple test which just logs the time when a concurrent request starts and then sleeps for a second.
static void TestParallelism()
{
int expectedThreadCount = 100;
ThreadPool.SetMaxThreads(Environment.CurrentManagedThreadId, expectedThreadCount);
var module = new WCFCompositeModule();
module.Initialize(new Keywords());
var range = Enumerable.Range(0, expectedThreadCount);
var startTimes = new ConcurrentBag<DateTime>();
var parallelOptions = new ParallelOptions { MaxDegreeOfParallelism = expectedThreadCount };
Parallel.ForEach(range, i =>
{
startTimes.Add(DateTime.Now);
Thread.Sleep(1000);
});
foreach (var time in startTimes)
{
Console.WriteLine(string.Format("{0: HH:mm:ss.fff}", time));
}
Console.ReadLine();
}
When I execute this with 100 expected threads I can see 12 different distinct start times all varying by 1 seconds. Instead of seeing them all start at the same second.
Sample
13:59:27.475
13:59:26.473
13:59:25.473
13:59:24.471
13:59:23.470
13:59:22.469
Is this occurring because sleep blocks the thread?
You seem to be expecting this code to launch 100 threads, however
MaxDegreeOfParallelism = expectedThreadCount
Only sets a maximum, not a minimum, the system will dispatch it to however many (and at most MaxDegreeOfParallelism) threads.
Unless you have a 100 core machine it's not likely to use up 100 threads, so it's using X threads and then they all block for 1 second before it dispatch the next batch on the same previous X threads. Here x is 100/12 = 8.333333, so if you're seeing 12 different times it's starting at least 12 different batchs meaning it executes somewhere between 8 to 9 threads at a time, if you have a 4 core system with hyperthreading or a 8 core system this is probably the default behavior.

Stopping Parallel.ForEach if one of threads performs more than N minutes

I'm looking for a solution for stopping Parallel.ForEach if one of threads performs more than 2 minutes.
The next solution I think is not very good because of x2 extra threads:
Parallel.ForEach(items, (item, opt) =>
{
var thread = new Thread(() => { /* a process */ });
thread.Start();
bool finished = thread.Join(TimeSpan.FromMinutes(2));
if (!finished)
{
thread.Abort();
opt.Stop();
}
});
Do you know a better solution for the issue?
First of all, I want to note that Parallel class will not create a thread for each of your item, it will use default ThreadPool, which has by default number of threads equal to processor's cores count. Other problem in your code is that you do not stop all the tasks after 2 minutes of working, you only cancel the one which you've waited for two minutes.
I suggest you remove the Thread usage from your code, and create an array of Tasks with single CancellationToken for them with a Timeout for it or with a timeout for TaskFactory, and start them all. Also your code should explicitly check the token for cancellation pending.
So your code could be something like this:
var cts = new CancellationTokenSource();
// two minutes in milliseconds
cts.CancelAfter(2000 * 60);
var tasks = new List<Task>();
foreach (var item in items)
{
// this is needed because of closures work in C#
var localItem = item;
tasks.Add(Task.Run(() =>
{ /* a process with a localItem here */
// this check should be repeated from time to time in your calculations
if (cts.Token.IsCancellationRequested)
{
cts.Token.ThrowIfCancellationRequested();
}
}
// all tasks has only one token
, cts.Token)
}
// this will cancel all tasks after 2 minutes from start
Task.WaitAll(tasks.ToArray(), TimeSpan.FromMinutes(2));
// this will cancel all tasks if one of them will last more than 2 minutes
Task.WaitAll(tasks.ToArray());
Update:
As you said that the each item is independent, you can create CancellationTokenSource for each task, but, as #ScottChamberlain noted, in this case too many tasks will run in the same time. You can write your own TaskScheduler, use some Semafor (or it's slim version) or simply use the Parallel class with ParallelOptions.MaxDegreeOfParallelism correctly set.

What is the invocation sequence of awaiting threads after lock released [duplicate]

When multiple threads request a lock on the same object, does the CLR guarantee that the locks will be acquired in the order they were requested?
I wrote up a test to see if this was true, and it seems to indicate yes, but I'm not sure if this is definitive.
class LockSequence
{
private static readonly object _lock = new object();
private static DateTime _dueTime;
public static void Test()
{
var states = new List<State>();
_dueTime = DateTime.Now.AddSeconds(5);
for (int i = 0; i < 10; i++)
{
var state = new State {Index = i};
ThreadPool.QueueUserWorkItem(Go, state);
states.Add(state);
Thread.Sleep(100);
}
states.ForEach(s => s.Sync.WaitOne());
states.ForEach(s => s.Sync.Close());
}
private static void Go(object state)
{
var s = (State) state;
Console.WriteLine("Go entered: " + s.Index);
lock (_lock)
{
Console.WriteLine("{0,2} got lock", s.Index);
if (_dueTime > DateTime.Now)
{
var time = _dueTime - DateTime.Now;
Console.WriteLine("{0,2} sleeping for {1} ticks", s.Index, time.Ticks);
Thread.Sleep(time);
}
Console.WriteLine("{0,2} exiting lock", s.Index);
}
s.Sync.Set();
}
private class State
{
public int Index;
public readonly ManualResetEvent Sync = new ManualResetEvent(false);
}
}
Prints:
Go entered: 0
0 got lock
0 sleeping for 49979998 ticks
Go entered: 1
Go entered: 2
Go entered: 3
Go entered: 4
Go entered: 5
Go entered: 6
Go entered: 7
Go entered: 8
Go entered: 9
0 exiting lock
1 got lock
1 sleeping for 5001 ticks
1 exiting lock
2 got lock
2 sleeping for 5001 ticks
2 exiting lock
3 got lock
3 sleeping for 5001 ticks
3 exiting lock
4 got lock
4 sleeping for 5001 ticks
4 exiting lock
5 got lock
5 sleeping for 5001 ticks
5 exiting lock
6 got lock
6 exiting lock
7 got lock
7 exiting lock
8 got lock
8 exiting lock
9 got lock
9 exiting lock
IIRC, it's highly likely to be in that order, but it's not guaranteed. I believe there are at least theoretically cases where a thread will be woken spuriously, note that it still doesn't have the lock, and go to the back of the queue. It's possible that's only for Wait/Notify, but I have a sneaking suspicion it's for locking as well.
I definitely wouldn't rely on it - if you need things to occur in a sequence, build up a Queue<T> or something similar.
EDIT: I've just found this within Joe Duffy's Concurrent Programming on Windows which basically agrees:
Because monitors use kernel objects internally, they exhibit the same roughly-FIFO behavior that the OS synchronization mechanisms also exhibit (described in the previous chapter). Monitors are unfair, so if another thread tries to acquire the lock before an awakened waiting thread tries to acquire the lock, the sneaky thread is permitted to acquire a lock.
The "roughly-FIFO" bit is what I was thinking of before, and the "sneaky thread" bit is further evidence that you shouldn't make assumptions about FIFO ordering.
Normal CLR locks are not guaranteed to be FIFO.
But, there is a QueuedLock class in this answer which will provide a guaranteed FIFO locking behavior.
The lock statement is documented to use the Monitor class to implement its behavior, and the docs for the Monitor class make no mention (that I can find) of fairness. So you should not rely on requested locks being acquired in the order of request.
In fact, an article by Jeffery Richter indicates in fact lock is not fair:
Thread Synchronization Fairness in the .NET CLR
Granted - it's an old article so things may have changed, but given that no promises are made in the contract for the Monitor class about fairness, you need to assume the worst.
Slightly tangential to the question, but ThreadPool doesn't even guarantee that it will execute queued work items in the order they are added. If you need sequential execution of asynchronous tasks, one option is using TPL Tasks (also backported to .NET 3.5 via Reactive Extensions). It would look something like this:
public static void Test()
{
var states = new List<State>();
_dueTime = DateTime.Now.AddSeconds(5);
var initialState = new State() { Index = 0 };
var initialTask = new Task(Go, initialState);
Task priorTask = initialTask;
for (int i = 1; i < 10; i++)
{
var state = new State { Index = i };
priorTask = priorTask.ContinueWith(t => Go(state));
states.Add(state);
Thread.Sleep(100);
}
Task finalTask = priorTask;
initialTask.Start();
finalTask.Wait();
}
This has a few advantages:
Execution order is guaranteed.
You no longer require an explicit lock (the TPL takes care of those details).
You no longer need events and no longer need to wait on all events. You can simply say: wait for the last task to complete.
If an exception were thrown in any of the tasks, subsequent tasks would be aborted and the exception would be rethrown by the call to Wait. This may or may not match your desired behavior, but is generally the best behavior for sequential, dependent tasks.
By using the TPL, you have added flexibility for future expansion, such as cancellation support, waiting on parallel tasks for continuation, etc.
I am using this method to do FIFO lock
public class QueuedActions
{
private readonly object _internalSyncronizer = new object();
private readonly ConcurrentQueue<Action> _actionsQueue = new ConcurrentQueue<Action>();
public void Execute(Action action)
{
// ReSharper disable once InconsistentlySynchronizedField
_actionsQueue.Enqueue(action);
lock (_internalSyncronizer)
{
Action nextAction;
if (_actionsQueue.TryDequeue(out nextAction))
{
nextAction.Invoke();
}
else
{
throw new Exception("Something is wrong. How come there is nothing in the queue?");
}
}
}
}
The ConcurrentQueue will order the execution of the actions while the threads are waiting in the lock.

Suggest a way in threading that always fixed number of threads must be used in computing?

I have an array of filepath in List<string> with thousands of files. I want to process them in a function parallel with 8 threads.
ParallelOptions opt= new ParallelOptions();
opt.TaskScheduler = null;
opt.MaxDegreeOfParallelism = 8;
Parallel.ForEach(fileList, opt, item => DoSomething(item));
This code works fine for me but it guarantees to run max 8 threads and I want to run 8 threads always. CLR decides the number of threads to be use as per CPU load.
Please suggest a way in threading that always 8 threads are used in computing with minimum overhead.
Use a producer / consumer model. Create one producer and 8 consumers. For example:
BlockingCollection<string> _filesToProcess = new BlockingCollection<string>();
// start 8 tasks to do the processing
List<Task> _consumers = new List<Task>();
for (int i = 0; i < 8; ++i)
{
var t = Task.Factory.StartNew(ProcessFiles, TaskCreationOptions.LongRunning);
_consumers.Add(t);
}
// Populate the queue
foreach (var filename in filelist)
{
_filesToProcess.Add(filename);
}
// Mark the collection as complete for adding
_filesToProcess.CompleteAdding();
// wait for consumers to finish
Task.WaitAll(_consumers.ToArray(), Timeout.Infinite);
Your processing method removes things from the BlockingCollection and processes them:
void ProcessFiles()
{
foreach (var filename in _filesToProcess.GetConsumingEnumerable())
{
// do something with the file name
}
}
That will keep 8 threads running until the collection is empty. Assuming, of course, you have 8 cores on which to run the threads. If you have fewer available cores, then there will be a lot of context switching, which will cost you.
See BlockingCollection for more information.
Within a static counter, you might be able to get the number of current threads.
Every time you call start a task there is the possibility to use the Task.ContinueWith (http://msdn.microsoft.com/en-us/library/dd270696.aspx) to notify that it's over and you can start another one.
This way there is going to be always 8 tasks running.
OrderablePartitioner<Tuple<int, int>> chunkPart = Partitioner.Create(0, fileList.Count, 1);//Partition the list in chunk of 1 entry
ParallelOptions opt= new ParallelOptions();
opt.TaskScheduler = null;
opt.MaxDegreeOfParallelism = 8;
Parallel.ForEach(chunkPart, opt, chunkRange =>
{
for (int i = chunkRange.Item1; i < chunkRange.Item2; i++)
{
DoSomething(fileList[i].FullName);
}
});

Does Parallel.ForEach limit the number of active threads?

Given this code:
var arrayStrings = new string[1000];
Parallel.ForEach<string>(arrayStrings, someString =>
{
DoSomething(someString);
});
Will all 1000 threads spawn almost simultaneously?
No, it won't start 1000 threads - yes, it will limit how many threads are used. Parallel Extensions uses an appropriate number of cores, based on how many you physically have and how many are already busy. It allocates work for each core and then uses a technique called work stealing to let each thread process its own queue efficiently and only need to do any expensive cross-thread access when it really needs to.
Have a look at the PFX Team Blog for loads of information about how it allocates work and all kinds of other topics.
Note that in some cases you can specify the degree of parallelism you want, too.
On a single core machine... Parallel.ForEach partitions (chunks) of the collection it's working on between a number of threads, but that number is calculated based on an algorithm that takes into account and appears to continually monitor the work done by the threads it's allocating to the ForEach. So if the body part of the ForEach calls out to long running IO-bound/blocking functions which would leave the thread waiting around, the algorithm will spawn up more threads and repartition the collection between them. If the threads complete quickly and don't block on IO threads for example, such as simply calculating some numbers, the algorithm will ramp up (or indeed down) the number of threads to a point where the algorithm considers optimum for throughput (average completion time of each iteration).
Basically the thread pool behind all the various Parallel library functions, will work out an optimum number of threads to use. The number of physical processor cores forms only part of the equation. There is NOT a simple one to one relationship between the number of cores and the number of threads spawned.
I don't find the documentation around the cancellation and handling of synchronizing threads very helpful. Hopefully MS can supply better examples in MSDN.
Don't forget, the body code must be written to run on multiple threads, along with all the usual thread safety considerations, the framework does not abstract that factor... yet.
Great question. In your example, the level of parallelization is pretty low even on a quad core processor, but with some waiting the level of parallelization can get quite high.
// Max concurrency: 5
[Test]
public void Memory_Operations()
{
ConcurrentBag<int> monitor = new ConcurrentBag<int>();
ConcurrentBag<int> monitorOut = new ConcurrentBag<int>();
var arrayStrings = new string[1000];
Parallel.ForEach<string>(arrayStrings, someString =>
{
monitor.Add(monitor.Count);
monitor.TryTake(out int result);
monitorOut.Add(result);
});
Console.WriteLine("Max concurrency: " + monitorOut.OrderByDescending(x => x).First());
}
Now look what happens when a waiting operation is added to simulate an HTTP request.
// Max concurrency: 34
[Test]
public void Waiting_Operations()
{
ConcurrentBag<int> monitor = new ConcurrentBag<int>();
ConcurrentBag<int> monitorOut = new ConcurrentBag<int>();
var arrayStrings = new string[1000];
Parallel.ForEach<string>(arrayStrings, someString =>
{
monitor.Add(monitor.Count);
System.Threading.Thread.Sleep(1000);
monitor.TryTake(out int result);
monitorOut.Add(result);
});
Console.WriteLine("Max concurrency: " + monitorOut.OrderByDescending(x => x).First());
}
I haven't made any changes yet and the level of concurrency/parallelization has jumped drammatically. Concurrency can have its limit increased with ParallelOptions.MaxDegreeOfParallelism.
// Max concurrency: 43
[Test]
public void Test()
{
ConcurrentBag<int> monitor = new ConcurrentBag<int>();
ConcurrentBag<int> monitorOut = new ConcurrentBag<int>();
var arrayStrings = new string[1000];
var options = new ParallelOptions {MaxDegreeOfParallelism = int.MaxValue};
Parallel.ForEach<string>(arrayStrings, options, someString =>
{
monitor.Add(monitor.Count);
System.Threading.Thread.Sleep(1000);
monitor.TryTake(out int result);
monitorOut.Add(result);
});
Console.WriteLine("Max concurrency: " + monitorOut.OrderByDescending(x => x).First());
}
// Max concurrency: 391
[Test]
public void Test()
{
ConcurrentBag<int> monitor = new ConcurrentBag<int>();
ConcurrentBag<int> monitorOut = new ConcurrentBag<int>();
var arrayStrings = new string[1000];
var options = new ParallelOptions {MaxDegreeOfParallelism = int.MaxValue};
Parallel.ForEach<string>(arrayStrings, options, someString =>
{
monitor.Add(monitor.Count);
System.Threading.Thread.Sleep(100000);
monitor.TryTake(out int result);
monitorOut.Add(result);
});
Console.WriteLine("Max concurrency: " + monitorOut.OrderByDescending(x => x).First());
}
I reccommend setting ParallelOptions.MaxDegreeOfParallelism. It will not necessarily increase the number of threads in use, but it will ensure you only start a sane number of threads, which seems to be your concern.
Lastly to answer your question, no you will not get all threads to start at once. Use Parallel.Invoke if you are looking to invoke in parallel perfectly e.g. testing race conditions.
// 636462943623363344
// 636462943623363344
// 636462943623363344
// 636462943623363344
// 636462943623363344
// 636462943623368346
// 636462943623368346
// 636462943623373351
// 636462943623393364
// 636462943623393364
[Test]
public void Test()
{
ConcurrentBag<string> monitor = new ConcurrentBag<string>();
ConcurrentBag<string> monitorOut = new ConcurrentBag<string>();
var arrayStrings = new string[1000];
var options = new ParallelOptions {MaxDegreeOfParallelism = int.MaxValue};
Parallel.ForEach<string>(arrayStrings, options, someString =>
{
monitor.Add(DateTime.UtcNow.Ticks.ToString());
monitor.TryTake(out string result);
monitorOut.Add(result);
});
var startTimes = monitorOut.OrderBy(x => x.ToString()).ToList();
Console.WriteLine(string.Join(Environment.NewLine, startTimes.Take(10)));
}
It works out an optimal number of threads based on the number of processors/cores. They will not all spawn at once.
See Does Parallel.For use one Task per iteration? for an idea of a "mental model" to use. However the author does state that "At the end of the day, it’s important to remember that implementation details may change at any time."
Does Parallel.ForEach limit the number of active threads?
If you don't configure the Parallel.ForEach with a positive MaxDegreeOfParallelism, the answer is no. With the default configuration, and assuming a source sequence with sufficiently large size, the Parallel.ForEach will use all the threads that are immediately available in the ThreadPool, and will continuously ask for more. By itself the Parallel.ForEach imposes zero restrictions to the number of threads. It is only limited by the capabilities of the associated TaskScheduler.
The default ParallelOptions.MaxDegreeOfParallelism is -1, which means unlimited parallelism.
The default ParallelOptions.TaskScheduler is the ThreadPoolTaskScheduler, better known as TaskScheduler.Default.
So if you want to understand the behavior of an unconfigured Parallel.ForEach, you have to know the behavior of the ThreadPool. Which is simple enough so that it can be described in a single paragraph. The ThreadPool satisfies immediately all requests for work, by starting new threads, up to a soft limit which be default is Environment.ProcessorCount. Upon reaching this limit, further requests are queued, and new threads are created with a rate of one new thread per second to satisfy the demand¹. There is also a hard limit on the number of threads, which in my machine is 32,767 threads. The soft limit is configurable with the ThreadPool.SetMinThreads method. The ThreadPool also retires threads, in case there are too many and there is no queued work, at about the same rate (1 per second).
Below is an experimental demonstration that the Parallel.ForEach uses all the available threads in the ThreadPool. The number of available threads
is configured up front with the ThreadPool.SetMinThreads, and then the Parallel.ForEach kicks in and takes all of them:
ThreadPool.SetMinThreads(workerThreads: 100, completionPortThreads: 10);
HashSet<Thread> threads = new();
int concurrency = 0;
int maxConcurrency = 0;
Parallel.ForEach(Enumerable.Range(1, 1500), n =>
{
lock (threads) maxConcurrency = Math.Max(maxConcurrency, ++concurrency);
lock (threads) threads.Add(Thread.CurrentThread);
// Simulate a CPU-bound operation that takes 200 msec
Stopwatch stopwatch = Stopwatch.StartNew();
while (stopwatch.ElapsedMilliseconds < 200) { }
lock (threads) --concurrency;
});
Console.WriteLine($"Number of unique threads: {threads.Count}");
Console.WriteLine($"Maximum concurrency: {maxConcurrency}");
Output (after waiting for ~5 seconds):
Number of unique threads: 102
Maximum concurrency: 102
Online demo.
The number of completionPortThreads is irrelevant for this test. The Parallel.ForEach uses the threads designated as "workerThreads". The 102 threads are:
The 100 ThreadPool threads that were created immediately on demand.
One more ThreadPool thread that was injected after 1 sec delay.
The main thread of the console application.
¹ The injection rate of the ThreadPool is not documented. As of .NET 7 it is one thread per second, but this could change in future .NET versions.

Categories

Resources