WithDegreeOfParallelism(N>CPU count)

WithDegreeOfParallelism(N>CPU count) - c#

System.Threading.ThreadPool.SetMaxThreads(50, 50);
File.ReadLines().AsParallel().WithDegreeOfParallelism(100).ForAll((s)->{
/*
some code which is waiting external API call
and do not utilize CPU
*/
});
I have never got threads count more than CPU count in my system.
Can I use PLINQ and get more than one thread per CPU?

If you're calling external web API, you might be hitting the limit of concurrent simultaneous connections, which is set to 2. In the begining of your application do the following:
System.Net.ServicePointManager.DefaultConnectionLimit = 4096;
System.Net.ServicePointManager.Expect100Continue = false;
Try if that helps. If not, there might be some other bottleneck within the routine you're trying to parallelize.
Also, just like other responders said, ThreadPool decides how many threads to spin up based on load. In my experience with TPL I've seen that thread cound increases by time: longer the app runs, and heavier load gets, more threads are spun up.

PLINQ uses a hill-climbing algorithm to determine the optimum size of the thread pool which is used by the TPL. I think that if you put a lot of I/O in your tasks, seeing more threads than the cpu count is likeable.
That said, I've never seen more threads than the cpu count :) . But maybe I never had the right situation.

I tested this with the following code:
var lines = Enumerable.Range(0, 200).ToArray();
int currentThreads = 0;
int maxThreads = 0;
object l = new object();
lines.AsParallel().WithDegreeOfParallelism(100).ForAll(
s =>
{
lock (l)
{
currentThreads++;
if (currentThreads > maxThreads)
{
maxThreads = currentThreads;
Console.WriteLine(maxThreads);
}
}
Thread.Sleep(3000);
lock (l)
{
currentThreads--;
}
});
Console.WriteLine();
Console.WriteLine(maxThreads);
Basically, it records the current number of concurrently executing iterations and then saves the maximum encountered value.
The results vary quite a bit, between 15 and 25, but it's always much more than the number of CPUs my computer has (4). Increasing the sleep time increases the maximum number of concurrent threads. So it looks like the limiting factor here is the ThreadPool: it will create new threads slowly, especially when jobs are being completed relatively quickly.
If you want to increase the number of threads used, you would need to use SetMinThreads() (not SetMaxThreads()). If I set the minimum to 50, the number of threads actually used is around 60.
But having dozens of threads that do nothing but wait is quite inefficient, especially when it comes to memory consumption. You should consider using asynchronous methods instead.

PLINQ does not fit in this case.
I have found next article useful for me.
http://msdn.microsoft.com/en-us/library/hh228609(v=vs.110).aspx

Short answer: nope.
The amount of threading is simply up to the .Net Framework runtime. There is no developer control for controlling the number of threads for TPL (Task Parallel Library) usage.
EDIT
Thanks to some other feedback: it is actually possible--but not recommended--to manually control the number of threads in the ThreadPool, which PLINQ and TPL use.
It's my opinion that any parallelization problem needs to be carefully thought out, and carefully constructed and tested. There's a lot of subtlety in this.

Related

Parallel LINQ GroupBy taking long time on systems with high amount of cores

We detected a weird problem when running a parallel GroupBy on a system with high amount of cores.
We're running this on .Net Framework 4.7.2.
The (simplified) code:
public static void Main()
{
//int MAX_THREADS = Environment.ProcessorCount - 2;
//ThreadPool.SetMinThreads(1, 1);
//ThreadPool.SetMaxThreads(MAX_THREADS, MAX_THREADS);
var elements = new List<ElementInfo>();
for (int i = 0; i < 250000; i++)
elements.Add(new ElementInfo() { Name = "123", Description = "456" });
using (var cancellationTokenSrc = new CancellationTokenSource())
{
var cancellationToken = cancellationTokenSrc.Token;
var dummy = elements.AsParallel()
.WithCancellation(cancellationToken)
.Select(x => new { Name = x.Name })
.GroupBy(x => "abc")
.ToDictionary(g => g.Key, g => g.ToList());
}
}
public class ElementInfo
{
public string Name { get; set; }
public string Description { get; set; }
}
This code is running in an application that is already using about 100 threads. Running this on a "normal" pc (12 or 16 cores), it runs very fast (less than 1 second).
Running this on a PC with a high amount of cores (48), it runs very slow (20 seconds).
Taking a dump during the 20 second delay, I see the threads running this LINQ are all waiting in HashRepartitionEnumerator.MoveNext().
There's a m_barrier.Wait(), so I think it is waiting there. It seems to wait on m_barrier, which is set to the number of partitions.
My guess is the following:
The number of partitions is set to the number of cores (48 in this case).
A number of threads are started in the thread pool, but the thread pool is full, so new threads need to be started. This happens at 1 thread per second.
While the threadpool is spinning up threads, all threads already running this LINQ query, are waiting until enough threads are started.
Only when enough threads are started, the LINQ query can finish.
Uncommenting the first lines in the Main method supports this thesis: By limiting the number of threads, the desired amount of threads is never reached, so this LINQ query never finishes.
Does this seem like a bug in .Net Framework, or am I doing something wrong?
Note: the real LINQ query has a few CPU-intensive Where-clauses, which makes it ideal to run in parallel. I removed this code as it isn't needed to reproduce the issue.

Does this seem like a bug in .NET Framework, or am I doing something wrong?
Yes, it does look like a bug, but actually this behavior is by design. The Task Parallel Library depends heavily on the ThreadPool by default, and the ThreadPool is not an incredibly clever piece of software. Which is both good and bad. It's good because its behavior is predictable, and it's bad because it behaves non-optimally when stressed. The algorithm that controls its behavior¹ is basically this:
Satisfy instantly all demands for work until the number of the worker threads reaches the number specified by the ThreadPool.SetMinThreads method, which
by default is equal to Environment.ProcessorCount.
If the demand for work cannot be satisfied by the available workers, inject more threads in the pool with a frequency of one new thread per second.
This algorithm offers very few configuration options. For example you can't control the injection rate of new threads. So if the behavior of the built-in ThreadPool doesn't fit your needs, you are in a tough situation. You could consider implementing your own ThreadPool, in the form of a custom TaskScheduler, but unfortunately the PLINQ library doesn't even allow to configure the scheduler. There is no public WithTaskScheduler option available, analogous to the ParallelOptions.TaskScheduler property that can be used with the Parallel class (it's internal, due to fear of deadlocks).
Rewriting the PLINQ library from scratch on top of a custom ThreadPool is presumably not a realistic option. So the best that you can really do is to ensure that the ThreadPool has always enough threads to satisfy the demand (increase the ThreadPool.SetMinThreads), specify explicitly the MaxDegreeOfParalellism whenever you use paralellization, and be conservative regarding the degree of paralellism of each parallel operation. Definitely avoid nesting one parallel operation inside another, because this is the easiest way to saturate the ThreadPool and cause it to misbehave.
¹ As of .NET 6. The behavior of the ThreadPool could change in future .NET versions.

Threads (Tasks) Limit in WebAPI (or in general)

We're developing WebAPI which has some logic of decryption of around 200 items (can be more). Each decryption takes around 20ms.
We've tried to parallel the tasks so we'll get it done as soon as possible, but it seems we're getting some kind of a limit as the threads are getting reused by waiting for the older threads to complete (and there are only few used) - overall action takes around 1-2 seconds to complete...
What we basically want to achieve is get x amount of threads start at the same time and finish after those ~20 ms.
We tried this:
Await multiple async Task while setting max running task at a time
But it seems this only describes setting a limit while we want to release it...
Here's a snippet:
var tasks = new List<Task>();
foreach (var element in Elements)
{
var task = new Task(() =>
{
element.Value = Cipher.Decrypt((string)element.Value);
}
});
task.Start();
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
What are we missing here?
Thanks,
Nir.

I cannot recommend parallelism on ASP.NET. It will certainly impact the scalability of your service, particularly if it is public-facing. I have thought "oh, I'm smart enough to do this" a couple of times and added parallelism in an ASP.NET app, only to have to tear it right back out a week later.
However, if you really want to...
it seems we're getting some kind of a limit
Is it the limit of physical cores on your machine?
We tried this: Await multiple async Task while setting max running task at a time
That solution is specifically for asynchronous concurrent code (e.g., I/O-bound). What you want is parallel (threaded) concurrent code (e.g., CPU-bound). Completely different use cases and solutions.
What are we missing here?
Your current code is throwing a ton of simultaneous tasks at the thread pool, which will attempt to handle them as best as it can. You can make this more efficient by using a higher-level abstraction, e.g., Parallel:
Parallel.ForEach(Elements, element =>
{
element.Value = Cipher.Decrypt((string)element.Value);
});
Parallel is more intelligent in terms of its partitioning and (re-)use of threads (i.e., not exceeding number of cores). So you should see some speedup.
However, I would expect it only to be a minor speedup. You are likely being limited by your number of physical cores.

Asuming no hyper threading:
If it takes 20ms for 1 item , then you can look at it as if it takes 1 core 20ms. If you want 200 items to complete in 20 ms, then you need 200 cores all for you. If you don't have that many, it just can't be done...
Under normal surcumstances, as many Task Will be scheduled parallel as optimal for you system

Threads not Garbage collected / ThreadPool threads / C#/.NET

In my C#/.NET 3.5 program I am using Threadpool threads ( delegate+BeginInvoke/EndInvoke) to parallelize and speed up some file loading. SystemInternals tool ProcessExplorer shows that number of threads in process is increasing over time, while I would expect to stay the same. Looks like some Threads/Threads handles stay hanging around for no reason.
Interestingly enough, I can not find pattern how threads grow and seems that happen sporadically, without repeatable pattern each time I start application. I spend some time analyzing and here are some observations:
1) code looks like this:
ArrayList IAsyncResult_s = new ArrayList();
AsyncProcessing thread1 = processRasterLayer;
... ArrayList filesToRender....
foreach (string FileName in filesToRender)
{
string fileName2 = FileName;
GeoImage partialImage1;
IAsyncResult asyncResult = thread1.BeginInvoke(
fileName2, .....,
out partialImage1, ..., null, null);
IAsyncResult_s.Add(asyncResult);
asyncResult = null;
}
.................
//block and render all
foreach (IAsyncResult asyncResult in IAsyncResult_s)
{
GeoImage partialImage1;
thread1.EndInvoke(
out partialImage1, , asyncResult);
//render image.. some calls to render partial image here
partialImage1.Dispose();
partialImage1 = null;
}
IAsyncResult_s.Clear();
IAsyncResult_s = null;
thread1 = null;
2) Number of Process Threads
My trace shows that during execution inside loop, ThreadPool.GetAvailableThreads(out workerThreads, out completionPortThreads); gives numbers like 493, 1000.
At the end of loops , , ThreadPool.GetAvailableThreads(out workerThreads, out completionPortThreads); gives numbers 500, 1000. So, number of available thread returns to same
Number of process threads reported by SystemInternals ProcessExplorer and API System.Diagnostics.Process.GetCurrentProcess().Threads.Count is the 16 before loops, and around 21 after loops.
If I call againg those loops, number of threads in process grows, but not by fixed nubmer each time, but grows 1-4 each time I repeat above code, so grows like 16->21->22->26->31...
3)Forced garbage collection didn’t htelp
I tried to froce garbage collection to get rid of those extra threads, but that didn’t removed them from process.
4)Profling tools
I was using RedGates Memory and Performace profilers, but hasen’t found obvious reason. I saw several extra threas and their object (ThreadContext etc) hanging, but saw no object holding those threads in memory. I am prety sure those extra threads were involved into loops work above, since I added thread name inside calls, and they still have that name I gave them.
5) Intelitrace
Intelitrace debuging showed also extra threads hanging. They still have names I gave them. But interestingly, it also showed that same thread that is hanging now, was used by above loop in the past, but also same thread was executing some timer related evens from timers form my code.
6) Locating issue
So, When I disable above loops that process filse Asynchroniously, and load files sequentialy, I do not have extra threads, and number of threads in my application is constant and and around 16.
7) Regarding SetMaxThreads :Here how it looks on my machine (XP, .NET 3.5):
Code like this:
ThreadPool.GetAvailableThreads(out AvailableWorkerThreads, out AvailableCompletionPortThreads);
ThreadPool.GetMaxThreads(out MaxWorkerThreads, out MaxCompletionPortThreads);
ThreadPool.GetMinThreads(out MinWorkerThreads, out MinCompletionPortThreads);
Gives result:
MinWorkerThreads:2 MaxWorkerThreads:500 MinCompletionPortThreads:2 MaxCompletionPortThreads:1000 AvailableWorkerThreads:500 AvailableCompletionPortThreads:1000
My app is using maybe 8 worker threads at the same time. I see no problem with SetMaxThreads.
8)
Functionally, I have no problems so far with this solution above. But somehow, if tools report that number of threads in my app is growing, it looks like “resource leak” of some kind, and I would like to address it. It looks like some thread handles are hanging around for no reason.
9) Here is one interesting article. It sasy that thread resources are cleaned once EndInvoke is called. I am doing so in my code. Article sasy: ..”. Because EndInvoke cleans up after the spawned thread, you must make sure that an EndInvoke is called for each BeginInvoke.” “If the thread pool thread has exited, EndInvoke does the following: It cleans up the exited thread's loose ends and disposes of its resources.” See: http://en.csharp-online.net/Asynchronous_Programming%E2%80%94BeginInvoke_EndInvoke
10) Another interesting article. Author says he had thread handle leaks because he was creating controls from non-gui thread. It is pretty elaborate article, see: http://msmvps.com/blogs/senthil/archive/2008/05/29/the-case-of-the-leaking-thread-handles.aspx
11) Another interesting article. It talks about ThreadPool.SetMinThreads property. It seems that it is not ThreadPool.SetMaxThreads but ThreadPool.SetMinThreads that enables useful control over ThradPool. This article is an eye-opener for me, and made me think about how ThreadPool works and performance problems it might cause. Article is: http_://www.dotnetperls.com/threadpool-setminthreads . Anoter similar one is : http_://www.codeproject.com/Articles/3813/NET-s-ThreadPool-Class-Behind-The-Scenes
12) Another interesting article. It is talking about throttling issue with ThreadPool. Article mentions ThreadPool limit of 2 new threads per second increase. See http_://social. msdn. microsoft. com/forums/en-US/clr/thread/3325cb32-371b-4f3e-965f-6ca88538dc3e/
13) So, in maybe 30 tests I saw only 2 times that number of threads allocated would shrink. But, it did happen. I saw once thread number going like 16->....->31->61-> ->30->16. So, it went back to 16. It doesn’t happen often, and it is not about time waited, it was like big activity in process, followed by a period of constant low level activity.
14) ThreadPool.SetMinThreads Method documentation. It talks about 2 new threads per second limit for threadpool. It is not clear if setting this property would remove that limit. http_://msdn.microsoft. com/en-ca/library/system. threading.threadpool.setminthreads(v=vs.90).aspx

So the answer is: there's no leak here. This is how the thread pool works. It keeps around threads that finished working so you don't have to pay the price of thread creation next time you need one. If you have many concurrent work items then the number of threads in the pool will increase but they'll max out at MaxWorkerThreads. (And it has nothing to do with the garbage collector.)
See this article for more info:
http://msdn.microsoft.com/en-us/library/0ka9477y.aspx

i would consider a consumer producer pattern. the idea behind a threadpool is to recycle threads, not create hundreds of new. in best case you have for each cpu one thread, and queue the work. this will be sure faster as you avoid useless context switches and waits for creating new threads, as far as i remember the net threadpool waits about one second until a new thread is created, to give other threads a chance to get recycled.

C# Multi-Threading - Limiting the amount of concurrent threads

I have question on controlling the amount of concurrent threads I want running. Let me explain with what I currently do: For example
var myItems = getItems(); // is just some generic list
// cycle through the mails, picking 10 at a time
int index = 0;
int itemsToTake = myItems.Count >= 10 ? 10 : myItems.Count;
while (index < myItems.Count)
{
var itemRange = myItems.GetRange(index, itemsToTake);
AutoResetEvent[] handles = new AutoResetEvent[itemsToTake];
for (int i = 0; i < itemRange.Count; i++)
{
var item = itemRange[i];
handles[i] = new AutoResetEvent(false);
// set up the thread
ThreadPool.QueueUserWorkItem(processItems, new Item_Thread(handles[i], item));
}
// wait for all the threads to finish
WaitHandle.WaitAll(handles);
// update the index
index += itemsToTake;
// make sure that the next batch of items to get is within range
itemsToTake = (itemsToTake + index < myItems.Count) ? itemsToTake : myItems.Count -index;
This is a path that I currently take. However I do not like it at all. I know I can 'manage' the thread pool itself, but I have heard it is not advisable to do so. So what is the alternative? The semaphore class?
Thanks.

Instead of using ThreadPool directly, you might also consider using TPL or PLINQ. For example, with PLINQ you could do something like this:
getItems().AsParallel()
.WithDegreeOfParallelism(numberOfThreadsYouWant)
.ForAll(item => process(item));
or using Parallel:
var options = new ParallelOptions {MaxDegreeOfParallelism = numberOfThreadsYouWant};
Parallel.ForEach(getItems, options, item => process(item));
Make sure that specifying the degree of parallelism does actually improve performance of your application. TPL and PLINQ use ThreadPool by default, which does a very good job of managing the number of threads that are running. In .NET 4, ThreadPool implements algorithms that add more processing threads only if that improves performance.

Don't use THE treadpool, get another one (just look for google, there are half a dozen implementations out) and manage that yourself.
Managing THE treadpool is not advisable as a lot of internal workings may go ther, managing your OWN threadpool instance is totally ok.

It looks like you can control the maximum number of threads using ThreadPool.SetMaxThreads, although I haven't tested this.

Assuming the question is; "How do I limit the number of worker threads?" The the answer would be use a producer-consumer queue where you control the number of worker threads. Just queue your items and let it handle workers.
Here is a generic implementation you could use.

you can use ThreadPool.SetMaxThreads Method
http://msdn.microsoft.com/en-us/library/system.threading.threadpool.setmaxthreads.aspx

In the documentation, there is a mention of SetMaxThreads ...
public static bool SetMaxThreads (
int workerThreads,
int completionPortThreads
)
Sets the number of requests to the thread pool that can be active concurrently. All requests above that number remain queued until thread pool threads become available.
However:
You cannot set the number of worker threads or the number of I/O completion threads to a number smaller than the number of processors in the computer.
But I guess you are anyways better served by using a non-singleton thread pool.

There is no reason to deal with hybrid thread synchronization constructs (such is AutoResetEvent) and the ThreadPool.
You can use a class that can act as the coordinator responsible for executing all of your code asynchronously.
Wrap using a Task or the APM pattern what the "Item_Thread" does. Then use the AsyncCoordinator class by Jeffrey Richter (can be found at the code from the book CLR via C# 3rd Edition).

Thread.Sleep for less than 1 millisecond

I want to call thread sleep with less than 1 millisecond.
I read that neither thread.Sleep nor Windows-OS support that.
What's the solution for that?
For all those who wonder why I need this:
I'm doing a stress test, and want to know how many messages my module can handle per second.
So my code is:
// Set the relative part of Second hat will be allocated for each message
//For example: 5 messages - every message will get 200 miliseconds
var quantum = 1000 / numOfMessages;
for (var i = 0; i < numOfMessages; i++)
{
_bus.Publish(new MyMessage());
if (rate != 0)
Thread.Sleep(quantum);
}
I'll be glad to get your opinion on that.

You can't do this. A single sleep call will typically block for far longer than a millisecond (it's OS and system dependent, but in my experience, Thread.Sleep(1) tends to block for somewhere between 12-15ms).
Windows, in general, is not designed as a real-time operating system. This type of control is typically impossible to achieve on normal (desktop/server) versions of Windows.
The closest you can get is typically to spin and eat CPU cycles until you've achieved the wait time you want (measured with a high performance counter). This, however, is pretty awful - you'll eat up an entire CPU, and even then, you'll likely get preempted by the OS at times and effectively "sleep" for longer than 1ms...

The code below will most definitely offer a more precise way of blocking, rather than calling Thread.Sleep(x); (although this method will block the thread, not put it to sleep). Below we are using the StopWatch class to measure how long we need to keep looping and block the calling thread.
using System.Diagnostics;
private static void NOP(double durationSeconds)
{
var durationTicks = Math.Round(durationSeconds * Stopwatch.Frequency);
var sw = Stopwatch.StartNew();
while (sw.ElapsedTicks < durationTicks)
{
}
}
Example usage,
private static void Main()
{
NOP(5); // Wait 5 seconds.
Console.WriteLine("Hello World!");
Console.ReadLine();
}

Why?
Usually there are a very limited number of CPUs and cores on one machine - you get just a small number if independent execution units.
From the other hands there are a number of processes and many more threads. Each thread requires some processor time, that is assigned internally by Windows core processes. Usually Windows blocks all threads and gives a certain amount of CPU core time to particular threads, then it switches the context to other threads.
When you call Thread.Sleep no matter how small you kill the whole time span Windows gave to the thread, as there is no reason to simply wait for it and the context is switched straight away. It can take a few ms when Windows gives your thread some CPU next time.
What to use?
Alternatively, you can spin your CPU, spinning is not a terrible thing to do and can be very useful. It is for example used in System.Collections.Concurrent namespace a lot with non-blocking collections, e.g.:
SpinWait sw = new SpinWait();
sw.SpinOnce();

Most of the legitimate reasons for using Thread.Sleep(1) or Thread.Sleep(0) involve fairly advanced thread synchronization techniques. Like Reed said, you will not get the desired resolution using conventional techniques. I do not know for sure what it is you are trying to accomplish, but I think I can assume that you want to cause an action to occur at 1 millisecond intervals. If that is the case then take a look at multimedia timers. They can provide resolution down to 1ms. Unfortunately, there is no API built into the .NET Framework (that I am aware of) that taps into this Windows feature. But you can use the interop layer to call directly into the Win32 APIs. There are even examples of doing this in C# out there.

In the good old days, you would use the "QueryPerformanceTimer" API of Win32, when sub milisecond resolution was needed.
There seems to be more info on the subject over on Code-Project: http://www.codeproject.com/KB/cs/highperformancetimercshar.aspx
This won't allow you to "Sleep()" with the same resolution as pointed out by Reed Copsey.
Edit:
As pointed out by Reed Copsey and Brian Gideon the QueryPerfomanceTimer has been replaced by Stopwatch in .NET

I was looking for the same thing as the OP, and managed to find an answer that works for me. I'm surprised that none of the other answers mentioned this.
When you call Thread.Sleep(), you can use one of two overloads: An int with the number of milliseconds, or a TimeSpan.
A TimeSpan's Constructor, in turn, has a number of overloads. One of them is a single long denoting the number of ticks the TimeSpan represents. One tick is a lot less than 1ms. In fact, another part of TimeSpan's docs gave an example of 10000 ticks happening in 1ms.
Therefore, I think the closest answer to the question is that if you want Thread.Sleep for less than 1ms, you would create a TimeSpan with less than 1ms worth of ticks, then pass that to Thread.Sleep().

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.