What is the minimum wait time to use ManualResetEventSlim instead of ManualResetEvent? - c#

From NET 4 I can use the ManualResetEventSlim class that make a little spinning before blocking in order to get a time optimization if the blocking time is little (I have no context switch).
I'd like to measure using a benchmark how little is this time in order to know, more or less, the amount of wait time necessary to prefer using a ManualResetEventSlim instead of a classic ManualResetEvent.
I know that this measure is CPU dependent, it is impossible to know a priori the Spin time, but I'd like to have an order of magnitude.
I wrote a benchmark class in order to get the minimum MillisecondSleep that make ManualResetEventSlim better than ManualResetEvent.
public class ManualResetEventTest
{
[Params(0, 1, 10)]
public int MillisecondsSleep;
[Benchmark]
public void ManualResetEventSlim()
{
using var mres = new ManualResetEventSlim(false);
var t = Task.Run(() =>
{
mres.Wait();
});
Thread.Sleep(MillisecondsSleep);
mres.Set();
t.Wait();
}
[Benchmark]
public void ManualResetEvent()
{
using var mres = new ManualResetEvent(false);
var t = Task.Run(() =>
{
mres.WaitOne();
});
Thread.Sleep(MillisecondsSleep);
mres.Set();
t.Wait();
}
}
And the result is the following
As you can see I found a improved performance only using a Thread.Sleep(0). Furthermore I see a 15ms mean time with both 1 and 10 ms.
Am I missing something?
Is it true that only with the 0 ms wait it is better to use a ManualResetEventSlim instead of ManualResetEvent?

From the excellent C# 9.0 in a Nutshell book:
Waiting or signaling an AutoResetEvent or ManualResetEvent takes about one microsecond (assuming no blocking).
ManualResetEventSlim and CountdownEvent can be up to 50 times faster in short-wait scenarios because of their nonreliance on the OS and judicious use of spinning constructs. In most scenarios, however, the overhead of the signaling classes themselves doesn't create a bottleneck; thus, it is rarely a consideration.
Hopefully that's enough to give you a rough order of magnitude.

Related

Parallel LINQ GroupBy taking long time on systems with high amount of cores

We detected a weird problem when running a parallel GroupBy on a system with high amount of cores.
We're running this on .Net Framework 4.7.2.
The (simplified) code:
public static void Main()
{
//int MAX_THREADS = Environment.ProcessorCount - 2;
//ThreadPool.SetMinThreads(1, 1);
//ThreadPool.SetMaxThreads(MAX_THREADS, MAX_THREADS);
var elements = new List<ElementInfo>();
for (int i = 0; i < 250000; i++)
elements.Add(new ElementInfo() { Name = "123", Description = "456" });
using (var cancellationTokenSrc = new CancellationTokenSource())
{
var cancellationToken = cancellationTokenSrc.Token;
var dummy = elements.AsParallel()
.WithCancellation(cancellationToken)
.Select(x => new { Name = x.Name })
.GroupBy(x => "abc")
.ToDictionary(g => g.Key, g => g.ToList());
}
}
public class ElementInfo
{
public string Name { get; set; }
public string Description { get; set; }
}
This code is running in an application that is already using about 100 threads. Running this on a "normal" pc (12 or 16 cores), it runs very fast (less than 1 second).
Running this on a PC with a high amount of cores (48), it runs very slow (20 seconds).
Taking a dump during the 20 second delay, I see the threads running this LINQ are all waiting in HashRepartitionEnumerator.MoveNext().
There's a m_barrier.Wait(), so I think it is waiting there. It seems to wait on m_barrier, which is set to the number of partitions.
My guess is the following:
The number of partitions is set to the number of cores (48 in this case).
A number of threads are started in the thread pool, but the thread pool is full, so new threads need to be started. This happens at 1 thread per second.
While the threadpool is spinning up threads, all threads already running this LINQ query, are waiting until enough threads are started.
Only when enough threads are started, the LINQ query can finish.
Uncommenting the first lines in the Main method supports this thesis: By limiting the number of threads, the desired amount of threads is never reached, so this LINQ query never finishes.
Does this seem like a bug in .Net Framework, or am I doing something wrong?
Note: the real LINQ query has a few CPU-intensive Where-clauses, which makes it ideal to run in parallel. I removed this code as it isn't needed to reproduce the issue.
Does this seem like a bug in .NET Framework, or am I doing something wrong?
Yes, it does look like a bug, but actually this behavior is by design. The Task Parallel Library depends heavily on the ThreadPool by default, and the ThreadPool is not an incredibly clever piece of software. Which is both good and bad. It's good because its behavior is predictable, and it's bad because it behaves non-optimally when stressed. The algorithm that controls its behavior¹ is basically this:
Satisfy instantly all demands for work until the number of the worker threads reaches the number specified by the ThreadPool.SetMinThreads method, which
by default is equal to Environment.ProcessorCount.
If the demand for work cannot be satisfied by the available workers, inject more threads in the pool with a frequency of one new thread per second.
This algorithm offers very few configuration options. For example you can't control the injection rate of new threads. So if the behavior of the built-in ThreadPool doesn't fit your needs, you are in a tough situation. You could consider implementing your own ThreadPool, in the form of a custom TaskScheduler, but unfortunately the PLINQ library doesn't even allow to configure the scheduler. There is no public WithTaskScheduler option available, analogous to the ParallelOptions.TaskScheduler property that can be used with the Parallel class (it's internal, due to fear of deadlocks).
Rewriting the PLINQ library from scratch on top of a custom ThreadPool is presumably not a realistic option. So the best that you can really do is to ensure that the ThreadPool has always enough threads to satisfy the demand (increase the ThreadPool.SetMinThreads), specify explicitly the MaxDegreeOfParalellism whenever you use paralellization, and be conservative regarding the degree of paralellism of each parallel operation. Definitely avoid nesting one parallel operation inside another, because this is the easiest way to saturate the ThreadPool and cause it to misbehave.
¹ As of .NET 6. The behavior of the ThreadPool could change in future .NET versions.

Controlling number of tasks being executed at an instance at runtime [duplicate]

I have the following code:
var factory = new TaskFactory();
for (int i = 0; i < 100; i++)
{
var i1 = i;
factory.StartNew(() => foo(i1));
}
static void foo(int i)
{
Thread.Sleep(1000);
Console.WriteLine($"foo{i} - on thread {Thread.CurrentThread.ManagedThreadId}");
}
I can see it only does 4 threads at a time (based on observation). My questions:
What determines the number of threads used at a time?
How can I retrieve this number?
How can I change this number?
P.S. My box has 4 cores.
P.P.S. I needed to have a specific number of tasks (and no more) that are concurrently processed by the TPL and ended up with the following code:
private static int count = 0; // keep track of how many concurrent tasks are running
private static void SemaphoreImplementation()
{
var s = new Semaphore(20, 20); // allow 20 tasks at a time
for (int i = 0; i < 1000; i++)
{
var i1 = i;
Task.Factory.StartNew(() =>
{
try
{
s.WaitOne();
Interlocked.Increment(ref count);
foo(i1);
}
finally
{
s.Release();
Interlocked.Decrement(ref count);
}
}, TaskCreationOptions.LongRunning);
}
}
static void foo(int i)
{
Thread.Sleep(100);
Console.WriteLine($"foo{i:00} - on thread " +
$"{Thread.CurrentThread.ManagedThreadId:00}. Executing concurently: {count}");
}
When you are using a Task in .NET, you are telling the TPL to schedule a piece of work (via TaskScheduler) to be executed on the ThreadPool. Note that the work will be scheduled at its earliest opportunity and however the scheduler sees fit. This means that the TaskScheduler will decide how many threads will be used to run n number of tasks and which task is executed on which thread.
The TPL is very well tuned and continues to adjust its algorithm as it executes your tasks. So, in most cases, it tries to minimize contention. What this means is if you are running 100 tasks and only have 4 cores (which you can get using Environment.ProcessorCount), it would not make sense to execute more than 4 threads at any given time, as otherwise it would need to do more context switching. Now there are times where you want to explicitly override this behaviour. Let's say in the case where you need to wait for some sort of IO to finish, which is a whole different story.
In summary, trust the TPL. But if you are adamant to spawn a thread per task (not always a good idea!), you can use:
Task.Factory.StartNew(
() => /* your piece of work */,
TaskCreationOptions.LongRunning);
This tells the DefaultTaskscheduler to explicitly spawn a new thread for that piece of work.
You can also use your own Scheduler and pass it in to the TaskFactory. You can find a whole bunch of Schedulers HERE.
Note another alternative would be to use PLINQ which again by default analyses your query and decides whether parallelizing it would yield any benefit or not, again in the case of a blocking IO where you are certain starting multiple threads will result in a better execution you can force the parallelism by using WithExecutionMode(ParallelExecutionMode.ForceParallelism) you then can use WithDegreeOfParallelism, to give hints on how many threads to use but remember there is no guarantee you would get that many threads, as MSDN says:
Sets the degree of parallelism to use in a query. Degree of
parallelism is the maximum number of concurrently executing tasks that
will be used to process the query.
Finally, I highly recommend having a read of THIS great series of articles on Threading and TPL.
If you increase the number of tasks to for example 1000000 you will see a lot more threads spawned over time. The TPL tends to inject one every 500ms.
The TPL threadpool does not understand IO-bound workloads (sleep is IO). It's not a good idea to rely on the TPL for picking the right degree of parallelism in these cases. The TPL is completely clueless and injects more threads based on vague guesses about throughput. Also to avoid deadlocks.
Here, the TPL policy clearly is not useful because the more threads you add the more throughput you get. Each thread can process one item per second in this contrived case. The TPL has no idea about that. It makes no sense to limit the thread count to the number of cores.
What determines the number of threads used at a time?
Barely documented TPL heuristics. They frequently go wrong. In particular they will spawn an unlimited number of threads over time in this case. Use task manager to see for yourself. Let this run for an hour and you'll have 1000s of threads.
How can I retrieve this number? How can I change this number?
You can retrieve some of these numbers but that's not the right way to go. If you need a guaranteed DOP you can use AsParallel().WithDegreeOfParallelism(...) or a custom task scheduler. You also can manually start LongRunning tasks. Do not mess with process global settings.
I would suggest using SemaphoreSlim because it doesn't use Windows kernel (so it can be used in Linux C# microservices) and also has a property SemaphoreSlim.CurrentCount that tells how many remaining threads are left so you don't need the Interlocked.Increment or Interlocked.Decrement. I also removed i1 because i is value type and it won't be changed by the call of foo method passing the i argument so it's no need to copy it into i1 to ensure it never changes (if that was the reasoning for adding i1):
private static void SemaphoreImplementation()
{
var maxThreadsCount = 20; // allow 20 tasks at a time
var semaphoreSlim = new SemaphoreSlim(maxTasksCount, maxTasksCount);
var taskFactory = new TaskFactory();
for (int i = 0; i < 1000; i++)
{
taskFactory.StartNew(async () =>
{
try
{
await semaphoreSlim.WaitAsync();
var count = maxTasksCount-semaphoreSlim.CurrentCount; //SemaphoreSlim.CurrentCount tells how many threads are remaining
await foo(i, count);
}
finally
{
semaphoreSlim.Release();
}
}, TaskCreationOptions.LongRunning);
}
}
static async void foo(int i, int count)
{
await Task.Wait(100);
Console.WriteLine($"foo{i:00} - on thread " +
$"{Thread.CurrentThread.ManagedThreadId:00}. Executing concurently: {count}");
}

Thread Safe Method with a Stop Watch

I'm trying to determine if my code I'm using is Thread Safe or not. I'm basically trying to call a method several times from different threads, and capture the time it takes for certain calls within the method to complete.
Here is an example of what I am doing.
using System;
using System.Collections.Concurrent;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
namespace ThreadTest
{
class Program
{
static BlockingCollection<TimeSpan> Timer1 = new BlockingCollection<TimeSpan>(new ConcurrentBag<TimeSpan>());
static TimeSpan CaptureTime(Action action)
{
Stopwatch stopwatch = Stopwatch.StartNew();
action();
stopwatch.Stop();
return stopwatch.Elapsed;
}
static void ThreadFunction()
{
TimeSpan timer1 = new TimeSpan();
timer1 = CaptureTime(() =>
{
//Do Some Work
});
Timer1.Add(timer1);
}
static void Main(string[] args)
{
for (int i = 0; i < 50; i++)
{
var task = new Task(ThreadFunction);
task.Start();
}
}
}
}
And what I'm trying to determine is whether or not the TimeSpan values returned by the CaptureTime method can be trusted.
Thank you to anyone who can enlighten me.
Use of Stopwatch here is not the problem. See this recent answer. Since you are in a single thread when you use the Stopwatch, it will work fine.
But I'm not sure this approach is really going to be very useful. Are you trying to create your own profiler? Why not just use existing profiling tools?
When you spin up 50 instances of the same operation, they're bound to fight for the same CPU resources. Also, a new Task might or might not spin up a new thread. Even then, the amount of switching involved would make the results less-than-meaningful. Unless you are specifically trying to observe parallel behavior, I would avoid this approach.
The better way would be to run the action 50 times sequentially, time the whole thing, then divide by 50. (Assuming this is a short-running task.)
The using of BlockingCollection<TimeSpan>(new ConcurrentBag<TimeSpan>()) is also very weird. Since you are just adding to the list, and it is static and pre-created, then you can just use List<TimeSpan>. See the notes on Thread Saftey in the List<T> documentation here.
Ignore that. I misunderstood the context of the docs. Your code is just fine, and is indeed thread-safe. Thanks to Jim and Alexi for clearing that up.
They can be 'trusted' alright but that does not mean they will be very accurate.
It depends on lots of factors but basically you would want to measure a large number of calls to action() (on the same thread) and average them. Especially when a single call takes a relatively short time ( <= 1 ms)
You will still have to deal with external factors, Windows is not a Real Time OS.

SpinWait vs Sleep waiting. Which one to use?

Is it efficient to
SpinWait.SpinUntil(() => myPredicate(), 10000)
for a timeout of 10000ms
or
Is it more efficient to use Thread.Sleep polling for the same condition
For example something along the lines of the following SleepWait function:
public bool SleepWait(int timeOut)
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
while (!myPredicate() && stopwatch.ElapsedMilliseconds < timeOut)
{
Thread.Sleep(50)
}
return myPredicate()
}
I'm concerned that all the yielding of SpinWait may not be a good usage pattern if we are talking about timeouts over 1sec? Is this a valid assumption?
Which approach do you prefer and why? Is there another even better approach?
Update - Becoming more specific:
Is there a way to Make BlockingCollection Pulse a sleeping thread when it reaches bounded capacity? I rather avoid a busy waits alltogether as Marc Gravel suggests.
In .NET 4 SpinWait performs CPU-intensive spinning for 10 iterations before yielding. But it does not return to the caller immediately after each of those cycles; instead, it calls Thread.SpinWait to spin via the CLR (essentially the OS) for a set time period. This time period is initially a few tens of nano-seconds but doubles with each iteration until the 10 iterations are complete. This enables clarity/predictability in the total time spent spinning (CPU-intensive) phase, which the system can tune according to conditions (number of cores etc.). If SpinWait remains in the spin-yielding phase for too long it will periodically sleep to allow other threads to proceed (see J. Albahari's blog for more information). This process is guaranteed to keep a core busy...
So, SpinWait limits the CPU-intensive spinning to a set number of iterations, after which it yields its time slice on every spin (by actually calling Thread.Yield and Thread.Sleep), lowering its resource consumption. It will also detect if the user is running a single core machine and yield on every cycle if that is the case.
With Thread.Sleep the thread is blocked. But this process will not be as expensive as the above in terms of CPU.
The best approach is to have some mechanism to actively detect the thing becoming true (rather than passively polling for it having become true); this could be any kind of wait-handle, or maybe a Task with Wait, or maybe an event that you can subscribe to to unstick yourself. Of course, if you do that kind of "wait until something happens", that is still not as efficient as simply having the next bit of work done as a callback, meaning: you don't need to use a thread to wait. Task has ContinueWith for this, or you can just do the work in an event when it gets fired. The event is probably the simplest approach, depending on the context. Task, however, already provides most-everything you are talking about here, including both "wait with timeout" and "callback" mechanisms.
And yes, spinning for 10 seconds is not great. If you want to use something like your current code, and if you have reason to expect a short delay, but need to allow for a longer one - maybe SpinWait for (say) 20ms, and use Sleep for the rest?
Re the comment; here's how I'd hook an "is it full" mechanism:
private readonly object syncLock = new object();
public bool WaitUntilFull(int timeout) {
if(CollectionIsFull) return true; // I'm assuming we can call this safely
lock(syncLock) {
if(CollectionIsFull) return true;
return Monitor.Wait(syncLock, timeout);
}
}
with, in the "put back into the collection" code:
if(CollectionIsFull) {
lock(syncLock) {
if(CollectionIsFull) { // double-check with the lock
Monitor.PulseAll(syncLock);
}
}
}

When is Parallel.Invoke useful?

I'm just diving into learning about the Parallel class in the 4.0 Framework and am trying to understand when it would be useful. At first after reviewing some of the documentation I tried to execute two loops, one using Parallel.Invoke and one sequentially like so:
static void Main()
{
DateTime start = DateTime.Now;
Parallel.Invoke(BasicAction, BasicAction2);
DateTime end = DateTime.Now;
var parallel = end.Subtract(start).TotalSeconds;
start = DateTime.Now;
BasicAction();
BasicAction2();
end = DateTime.Now;
var sequential = end.Subtract(start).TotalSeconds;
Console.WriteLine("Parallel:{0}", parallel.ToString());
Console.WriteLine("Sequential:{0}", sequential.ToString());
Console.Read();
}
static void BasicAction()
{
for (int i = 0; i < 10000; i++)
{
Console.WriteLine("Method=BasicAction, Thread={0}, i={1}", Thread.CurrentThread.ManagedThreadId, i.ToString());
}
}
static void BasicAction2()
{
for (int i = 0; i < 10000; i++)
{
Console.WriteLine("Method=BasicAction2, Thread={0}, i={1}", Thread.CurrentThread.ManagedThreadId, i.ToString());
}
}
There is no noticeable difference in time of execution here, or am I missing the point? Is it more useful for asynchronous invocations of web services or...?
EDIT: I removed the DateTime with Stopwatch, removed the write to the console with a simple addition operation.
UPDATE - Big Time Difference Now: Thanks for clearing up the problems I had when I involved Console
static void Main()
{
Stopwatch s = new Stopwatch();
s.Start();
Parallel.Invoke(BasicAction, BasicAction2);
s.Stop();
var parallel = s.ElapsedMilliseconds;
s.Reset();
s.Start();
BasicAction();
BasicAction2();
s.Stop();
var sequential = s.ElapsedMilliseconds;
Console.WriteLine("Parallel:{0}", parallel.ToString());
Console.WriteLine("Sequential:{0}", sequential.ToString());
Console.Read();
}
static void BasicAction()
{
Thread.Sleep(100);
}
static void BasicAction2()
{
Thread.Sleep(100);
}
The test you are doing is nonsensical; you are testing to see if something that you can not perform in parallel is faster if you perform it in parallel.
Console.Writeline handles synchronization for you so it will always act as though it is running on a single thread.
From here:
...call the SetIn, SetOut, or SetError method, respectively. I/O
operations using these streams are synchronized, which means multiple
threads can read from, or write to, the streams.
Any advantage that the parallel version gains from running on multiple threads is lost through the marshaling done by the console. In fact I wouldn't be surprised to see that all the thread switching actually means that the parallel run would be slower.
Try doing something else in the actions (a simple Thread.Sleep would do) that can be processed by multiple threads concurrently and you should see a large difference in the run times. Large enough that the inaccuracy of using DateTime as your timing mechanism will not matter too much.
It's not a matter of time of execution. The output to the console is determined by how the actions are scheduled to run. To get an accurate time of execution, you should be using StopWatch. At any rate, you are using Console.Writeline so it will appear as though it is in one thread of execution. Any thing you have tried to attain by using parallel.invoke is lost by the nature of Console.Writeline.
On something simple like that the run times will be the same. What Parallel.Invoke is doing is running the two methods at the same time.
In the first case you'll have lines spat out to the console in a mixed up order.
Method=BasicAction2, Thread=6, i=9776
Method=BasicAction, Thread=10, i=9985
// <snip>
Method=BasicAction, Thread=10, i=9999
Method=BasicAction2, Thread=6, i=9777
In the second case you'll have all the BasicAction's before the BasicAction2's.
What this shows you is that the two methods are running at the same time.
In ideal case (if number of delegates is equal to number of parallel threads & there are enough cpu cores) duration of operations will become MAX(AllDurations) instead of SUM(AllDurations) (if AllDurations is a list of each delegate execution times like {1sec,10sec, 20sec, 5sec} ). In less idealcase its moving in this direction.
Its useful when you don't care about the order in which delegates are invoked, but you care that you block thread execution until every delegate is completed, so yes it can be a situation where you need to gather data from various sources before you can proceed (they can be webservices or other types of sources).
Parallel.For can be used much more often I think, in this case its pretty much required that you got different tasks and each is taking substantial duration to execute, and I guess if you don't have an idea of possible range of execution times ( which is true for webservices) Invoke will shine the most.
Maybe your static constructor requires to build up two independant dictionaries for your type to use, you can invoke methods that fill them using Invoke() in parallel and shorten time 2x if they both take roughly same time for example.

Categories

Resources