Need to understand the usage of SemaphoreSlim - c#

Here is the code I have but I don't understand what SemaphoreSlim is doing.
async Task WorkerMainAsync()
{
SemaphoreSlim ss = new SemaphoreSlim(10);
List<Task> trackedTasks = new List<Task>();
while (DoMore())
{
await ss.WaitAsync();
trackedTasks.Add(Task.Run(() =>
{
DoPollingThenWorkAsync();
ss.Release();
}));
}
await Task.WhenAll(trackedTasks);
}
void DoPollingThenWorkAsync()
{
var msg = Poll();
if (msg != null)
{
Thread.Sleep(2000); // process the long running CPU-bound job
}
}
What do await ss.WaitAsync(); and ss.Release(); do?
I guess that if I run 50 threads at a time then write code like SemaphoreSlim ss = new SemaphoreSlim(10); then it will be forced to run 10 active thread at time.
When one of 10 threads completes then another thread will start. If I am not right then help me to understand with sample situation.
Why is await needed along with ss.WaitAsync();? What does ss.WaitAsync(); do?

In the kindergarden around the corner they use a SemaphoreSlim to control how many kids can play in the PE room.
They painted on the floor, outside of the room, 5 pairs of footprints.
As the kids arrive, they leave their shoes on a free pair of footprints and enter the room.
Once they are done playing they come out, collect their shoes and "release" a slot for another kid.
If a kid arrives and there are no footprints left, they go play elsewhere or just stay around for a while and check every now and then (i.e., no FIFO priorities).
When a teacher is around, she "releases" an extra row of 5 footprints on the other side of the corridor such that 5 more kids can play in the room at the same time.
It also has the same "pitfalls" of SemaphoreSlim...
If a kid finishes playing and leaves the room without collecting the shoes (does not trigger the "release") then the slot remains blocked, even though there is theoretically an empty slot. The kid usually gets told off, though.
Sometimes one or two sneaky kid hide their shoes elsewhere and enter the room, even if all footprints are already taken (i.e., the SemaphoreSlim does not "really" control how many kids are in the room).
This does not usually end well, since the overcrowding of the room tends to end in kids crying and the teacher fully closing the room.

i guess that if i run 50 thread at a time then code like SemaphoreSlim ss = new SemaphoreSlim(10); will force to run 10 active thread at time
That is correct; the use of the semaphore ensures that there won't be more than 10 workers doing this work at the same time.
Calling WaitAsync on the semaphore produces a task that will be completed when that thread has been given "access" to that token. await-ing that task lets the program continue execution when it is "allowed" to do so. Having an asynchronous version, rather than calling Wait, is important both to ensure that the method stays asynchronous, rather than being synchronous, as well as deals with the fact that an async method can be executing code across several threads, due to the callbacks, and so the natural thread affinity with semaphores can be a problem.
A side note: DoPollingThenWorkAsync shouldn't have the Async postfix because it's not actually asynchronous, it's synchronous. Just call it DoPollingThenWork. It will reduce confusion for the readers.

Although I accept this question really relates to a countdown lock scenario, I thought it worth sharing this link I discovered for those wishing to use a SemaphoreSlim as a simple asynchronous lock. It allows you to use the using statement which could make coding neater and safer.
http://www.tomdupont.net/2016/03/how-to-release-semaphore-with-using.html
I did swap _isDisposed=true and _semaphore.Release() around in its Dispose though in case it somehow got called multiple times.
Also it is important to note SemaphoreSlim is not a reentrant lock, meaning if the same thread calls WaitAsync multiple times the count the semaphore has is decremented every time. In short SemaphoreSlim is not Thread aware.
Regarding the questions code-quality it is better to put the Release within the finally of a try-finally to ensure it always gets released.

Related

Does SemaphoreSlim act as a gatekeeper or as a batchkeeper? [duplicate]

Here is the code I have but I don't understand what SemaphoreSlim is doing.
async Task WorkerMainAsync()
{
SemaphoreSlim ss = new SemaphoreSlim(10);
List<Task> trackedTasks = new List<Task>();
while (DoMore())
{
await ss.WaitAsync();
trackedTasks.Add(Task.Run(() =>
{
DoPollingThenWorkAsync();
ss.Release();
}));
}
await Task.WhenAll(trackedTasks);
}
void DoPollingThenWorkAsync()
{
var msg = Poll();
if (msg != null)
{
Thread.Sleep(2000); // process the long running CPU-bound job
}
}
What do await ss.WaitAsync(); and ss.Release(); do?
I guess that if I run 50 threads at a time then write code like SemaphoreSlim ss = new SemaphoreSlim(10); then it will be forced to run 10 active thread at time.
When one of 10 threads completes then another thread will start. If I am not right then help me to understand with sample situation.
Why is await needed along with ss.WaitAsync();? What does ss.WaitAsync(); do?
In the kindergarden around the corner they use a SemaphoreSlim to control how many kids can play in the PE room.
They painted on the floor, outside of the room, 5 pairs of footprints.
As the kids arrive, they leave their shoes on a free pair of footprints and enter the room.
Once they are done playing they come out, collect their shoes and "release" a slot for another kid.
If a kid arrives and there are no footprints left, they go play elsewhere or just stay around for a while and check every now and then (i.e., no FIFO priorities).
When a teacher is around, she "releases" an extra row of 5 footprints on the other side of the corridor such that 5 more kids can play in the room at the same time.
It also has the same "pitfalls" of SemaphoreSlim...
If a kid finishes playing and leaves the room without collecting the shoes (does not trigger the "release") then the slot remains blocked, even though there is theoretically an empty slot. The kid usually gets told off, though.
Sometimes one or two sneaky kid hide their shoes elsewhere and enter the room, even if all footprints are already taken (i.e., the SemaphoreSlim does not "really" control how many kids are in the room).
This does not usually end well, since the overcrowding of the room tends to end in kids crying and the teacher fully closing the room.
i guess that if i run 50 thread at a time then code like SemaphoreSlim ss = new SemaphoreSlim(10); will force to run 10 active thread at time
That is correct; the use of the semaphore ensures that there won't be more than 10 workers doing this work at the same time.
Calling WaitAsync on the semaphore produces a task that will be completed when that thread has been given "access" to that token. await-ing that task lets the program continue execution when it is "allowed" to do so. Having an asynchronous version, rather than calling Wait, is important both to ensure that the method stays asynchronous, rather than being synchronous, as well as deals with the fact that an async method can be executing code across several threads, due to the callbacks, and so the natural thread affinity with semaphores can be a problem.
A side note: DoPollingThenWorkAsync shouldn't have the Async postfix because it's not actually asynchronous, it's synchronous. Just call it DoPollingThenWork. It will reduce confusion for the readers.
Although I accept this question really relates to a countdown lock scenario, I thought it worth sharing this link I discovered for those wishing to use a SemaphoreSlim as a simple asynchronous lock. It allows you to use the using statement which could make coding neater and safer.
http://www.tomdupont.net/2016/03/how-to-release-semaphore-with-using.html
I did swap _isDisposed=true and _semaphore.Release() around in its Dispose though in case it somehow got called multiple times.
Also it is important to note SemaphoreSlim is not a reentrant lock, meaning if the same thread calls WaitAsync multiple times the count the semaphore has is decremented every time. In short SemaphoreSlim is not Thread aware.
Regarding the questions code-quality it is better to put the Release within the finally of a try-finally to ensure it always gets released.

Why threads continue to run after a cancel has been called?

Consider this simple example code:
var cts = new CancellationTokenSource();
var items = Enumerable.Range(1, 20);
var results = items.AsParallel().WithCancellation(cts.Token).Select(i =>
{
double result = Math.Log10(i);
return result;
});
try
{
foreach (var result in results)
{
if (result > 1)
cts.Cancel();
Console.WriteLine($"result = {result}");
}
}
catch (OperationCanceledException e)
{
if (cts.IsCancellationRequested)
Console.WriteLine($"Canceled");
}
Foreach of the results in the parallel results it prints the results until result > 1
This code output is something like:
result = 0.9030899869919435
result = 0.8450980400142568
result = 0.7781512503836436
result = 0
result = 0.6020599913279624
result = 0.47712125471966244
result = 0.3010299956639812
result = 0.6989700043360189
result = 0.9542425094393249
result = 1
result = 1.0413926851582251 <-- This is normal
result = 1.2041199826559248 <-- Why it prints this value (and below)
result = 1.0791812460476249
result = 1.2304489213782739
result = 1.1139433523068367
result = 1.255272505103306
result = 1.146128035678238
result = 1.2787536009528289
result = 1.1760912590556813
result = 1.3010299956639813
Canceled
My question is why it continue printing values over 1? I had expected that it the Cancel() token will be terminate the process.
Update 1
#mike-s's answer suggested:
It's also useful to check a cancellation token inside a loop (as a
means to abort the loop) or before a long operation.
I've tried adding a check
foreach (var result in results)
{
if (result > 1)
cts.Cancel();
if (!cts.IsCancellationRequested) //<----Check the cancellation token before printing
Console.WriteLine($"result = {result}");
}
It still gives the same result's output.
My question is why it continue printing values over 1?
Imagine you hired a hundred pilots to fly a hundred planes from a hundred airports. A bunch of them take off, and then you send a message saying "cancel all the flights". Well, there are a bunch of planes on the runway at takeoff speed when you send that message, and the message arrives after they are in the air. Those flights will not be cancelled!
You are discovering the most important thing to know about multithreaded programming. You have to reason as though every possible ordering of things happening might occur. That includes messages arriving later than you think they should.
In particular, your problem is a result of your abuse of the parallelization mechanisms, which are designed to parallelize long work. You've created a bunch of tasks that take less time to run than it takes to send the message stopping them. It should not be a surprise in that case that some of the tasks complete after they've been told to stop.
I expected that calling Cancel() on the token would terminate the process.
Your expectation is completely, totally wrong. Stop expecting that, since that expectation in no way conforms to reality. A cancellation token is a request to cancel an operation as soon as it is convenient to do so. It's not terminating a thread or a process.
However, even if you did terminate the threads, you would still observe this behaviour. Thread termination is an event like any other, and that event is not instantaneous. It takes time to execute, and other threads can continue their work while that thread termination is executing.
what do you mean by "convenient" in "a request to cancel an operation as soon as it is convenient to do so"?
Let's take a step back.
If the work to be done is extremely short, then there is no need to represent it as a task. Just do the work! In general if work takes less than about 30ms, just do the work.
Therefore, let's assume that every task takes a long time.
Now, why might a task take a long time? There are generally two reasons:
We're waiting for another system to complete some task. We're waiting for a network packet or a disk read or some such thing.
We have a huge amount of computation, and the CPU is saturated.
Suppose we are in the first situation. Does parallelizing help? No. If you are waiting for a package in the mail, hiring one, two, ten or a hundred people to wait does not make the package come faster.
But that does help for the second case; if we have an extra CPU in the machine we can dedicate two CPUs to solve the problem in about half the time.
Therefore we can assume that if we are parallelizing a task, it is because the CPU is doing a lot of work.
Great. Now, what is the nature of "CPU does a lot of work?" It almost always involves a loop somewhere.
So then, how do we cancel a task? We do not cancel a task by terminating the thread. We ask the task to cancel itself. A well-designed task will take a cancellation token, and in its loop will check to see if the cancellation token is indicating that the task is cancelled. Cancellation is cooperative. The task has to cooperate and decide when it checks to see if it is cancelled.
Notice that checking to see if you are cancelled is work, and that is work that takes time away from the real task. If you spend half your time checking to see if you are cancelled, your task takes twice as long as it could. And remember, the point of parallelizing the task is to make it take half as long, so doubling the amount of time it takes to do the task is a non-starter.
Therefore most tasks do not check every time through the loop if they are cancelled. A well-designed task will check every few milliseconds, not every few nanoseconds.
That's what I mean by "a cancellation is a request to stop when it is convenient". The task, if it was written correctly, should know what a good time to check for cancellation is so that it balances responsiveness against performance.
Cancel() on a cancellation token is just signaling the cancellation token, which just impacts other places in the code that check the token (such as calls to cts.IsCancellationRequested). Framework calls often will check the cancellation token and abort. It's also useful to check a cancellation token inside a loop (as a means to abort the loop) or before a long operation.
The cancellation token does not forcibly terminate a thread or process. There are other APIs for that, such as Environment.Exit.
Following up on Eric's excellent answer ... "a thread or process" and "a unit of work" usually should not be the same thing. Creating a thread to carry out one unit of work and then die is like shooting flaming arrows into the air: you can't control it, can't predict it, and those arrows start interfering with each other. The system becomes choked with so much work that it can't work on anything. (A condition called "thrashing.")
A much better strategy is modeled after a fast-food restaurant: a small number of workers, each with an assigned task, taking work-requests from a queue and delivering the finished sandwiches to another. At any instant, any queue might contain more or fewer entries. You don't see any of the workers falling down, dead. During lunch rush-hour, more workers are busy but at the same tasks. During a slow period they remain at their posts, patiently waiting for the next order to arrive. Any particular work-request might be flagged as "cancelled," and the workers notice this and respond accordingly. No part of the restaurant is "over-committed," and the entire operation is able to consistently produce a predictable number of sandwiches per hour, according to management control.

C# Restarting a thread with a different parameter

I want to start a background thread on some user event, in which I wait/sleep 10 seconds to do something if a variable changes between the time it was passed in and the time it is checked. However, during that 10 seconds, the same user event can repeat, and I want to interrupt & reset the thread to use the new variable and start back at 10 seconds.
For example,
private static int index = 0;
private static Thread myThread = null;
if(myThread != null && myThread.IsAlive) {
// need to 'restart' the thread with updated index
/* Suspend? Resume? */
} else {
// create a new thread and start countdown
myThread = new Thread(new ThreadStart( some_Thread(index) ));
myThread.Start();
}
I read that suspend() and resume() are antiquated, and I've read up some posts on Auto/ManualResetEvent, but they're not exactly what I'm looking for. It's probably something closer to Abort() then Start() a new one, but apparently that's unwise.
So any suggestions how to achieve this with one static thread handle? Again, the 10 seconds 'sleep' has to be interruptible and, thereafter, the thread be discardable or restartable. Thanks!
I want to start a background thread on some user event,
You are doing what we at SO call an "XY problem". You have a completely wrong idea about how to solve a problem and you are asking questions about how to make that wrong way work. Instead, concentrate on the user focussed problem you really have and ask about that.
in which I wait/sleep 10 seconds to do something if a variable changes between the time it was passed in and the time it is checked.
Don't do any of this stuff. If you're making a thread whose job it is to sleep, odds are good that you are doing something very, very wrong. Threads are expensive; only make a thread if you're going to be scheduling a CPU to service that thread.
When you are considering making a thread, ask yourself "would I hire a worker to do this task?" Ten seconds of computer time is ten billion nanoseconds; that's like hiring a worker and paying them to sleep for centuries. You'd never do that; you'd just put "do this later" on your to-do list, and come back to it later. If it gets cancelled, you'd take it off your to-do list.
What you want to do instead is make zero extra threads. Make a cancellable asynchronous workflow that awaits a Task.Delay before it does the work that must be done ten seconds later. If the user event happens during the delay then cancel the workflow and start a new workflow.
If the work that follows the delay is CPU intensive, then schedule a worker thread and await the result. If it is not -- if it is CPU work that comes back in say 30 ms or less -- then just run the work on the main thread. If it is IO gated, then use the asynchronous version of the IO API to stay on the main thread. You want to be making as few threads as you can get away with here.
Be careful. Even though everything is still on one thread, there are still race conditions that are possible in cancellable workflows like this. You still need to consider all possible interleavings of the non-dependent parts of your asynchronous workflows.

Possible Race condition with ManualResetEvent

Problem:
I am trying to throw 6 threads from ThreadPool to work on individual tasks. Each task's ManualResetEvent is stored in a array of manual reset event. Number of thread corresponds to the index in the ManualResetEvent Array.
Now what happens is that once I have initiated these 6 threads I move out and wait for the threads to complete. Waiting for the thread is done in the main thread.
Now some times what happens is that my waiting logic doesn't return even after a long time (2 days that I have seen). Here is the code sample for thread wait logic
foreach (ManualResetEvent whandle in eventList)
{
try
{
whandle.WaitOne();
}
catch (Exception) { }
}
As per documentation of .WaitOne. It is sync call which makes the thread to not return if Set event is not received from the thread.
Sometimes my threads have less amount of work and they may even return before I reach the Wait logic. Is it possible that .WaitOne() will wait for the Set() event even if it was received in the past?
Is this a correct logic to wait for the all the threads to close?
I'm not directly answering this question. Here is what you should do:
Start tasks using Task.Factory.StartNew and use Task.WaitAll(Task[]) to wait for them. You do not have to deal with events that way. Exceptions will nicely propagate to the "forking" thread. You don't need the old ThreadPool API anymore.
Hope this helps.
(Note: I think your best bet is Parallel.Invoke() - see later in this answer.)
What you are doing will normally work fine, so the problem is likely to be that one of your threads is blocking for some reason.
You should be able to debug this readily enough - you can attach the debugger and break into the program and then look at the call stack to see which thread(s) are blocked. Be prepared for some head-scratching if you discover a race condition though!
Another thing to be aware of that you can't do the following:
myEvent.Set();
myEvent.Reset();
with nothing (or very little) between the .Set() and the .Reset(). If you do that when several threads are waiting on myEvent, some of them will miss the event being set! (This effect is not well documented on MSDN.)
By the way, you shouldn't ignore exceptions - always log them in some way, at the very least.
(This section doesn't answer the question, but it may provide some helpful information)
I also want to mention an alternative way to wait for the threads. Since you have a set of ManualResetEvents, you can copy them to a plain array and pass it to WaitHandle.WaitAll().
Your code could look a little like this:
WaitHandle.WaitAll(eventList.ToArray());
Another approach to waiting for all threads to finish is to use a CountdownEvent. It becomes signalled when a countdown reaches zero; you start the count at the number of threads, and each thread signals it when it exits. There's an example here.
Parallel.Invoke()
If your threads do not return values, and all you want to to is to launch them and then have the launching thread wait for them to exit, then I think Parallel.Invoke() will be the best way of all. It avoids you having to handle the synchronization yourself.
(Otherwise, as svick says in the comments above, use Task rather than the old thread classes.)

C# thread pool limiting threads

Alright...I've given the site a fair search and have read over many posts about this topic. I found this question: Code for a simple thread pool in C# especially helpful.
However, as it always seems, what I need varies slightly.
I have looked over the MSDN example and adapted it to my needs somewhat. The example I refer to is here: http://msdn.microsoft.com/en-us/library/3dasc8as(VS.80,printer).aspx
My issue is this. I have a fairly simple set of code that loads a web page via the HttpWebRequest and WebResponse classes and reads the results via a Stream. I fire off this method in a thread as it will need to executed many times. The method itself is pretty short, but the number of times it needs to be fired (with varied data for each time) varies. It can be anywhere from 1 to 200.
Everything I've read seems to indicate the ThreadPool class being the prime candidate. Here is what things get tricky. I might need to fire off this thing say 100 times, but I can only have 3 threads at most running (for this particular task).
I've tried setting the MaxThreads on the ThreadPool via:
ThreadPool.SetMaxThreads(3, 3);
I'm not entirely convinced this approach is working. Furthermore, I don't want to clobber other web sites or programs running on the system this will be running on. So, by limiting the # of threads on the ThreadPool, can I be certain that this pertains to my code and my threads only?
The MSDN example uses the event drive approach and calls WaitHandle.WaitAll(doneEvents); which is how I'm doing this.
So the heart of my question is, how does one ensure or specify a maximum number of threads that can be run for their code, but have the code keep running more threads as the previous ones finish up until some arbitrary point? Am I tackling this the right way?
Sincerely,
Jason
Okay, I've added a semaphore approach and completely removed the ThreadPool code. It seems simple enough. I got my info from: http://www.albahari.com/threading/part2.aspx
It's this example that showed me how:
[text below here is a copy/paste from the site]
A Semaphore with a capacity of one is similar to a Mutex or lock, except that the Semaphore has no "owner" – it's thread-agnostic. Any thread can call Release on a Semaphore, while with Mutex and lock, only the thread that obtained the resource can release it.
In this following example, ten threads execute a loop with a Sleep statement in the middle. A Semaphore ensures that not more than three threads can execute that Sleep statement at once:
class SemaphoreTest
{
static Semaphore s = new Semaphore(3, 3); // Available=3; Capacity=3
static void Main()
{
for (int i = 0; i < 10; i++)
new Thread(Go).Start();
}
static void Go()
{
while (true)
{
s.WaitOne();
Thread.Sleep(100); // Only 3 threads can get here at once
s.Release();
}
}
}
Note: if you are limiting this to "3" just so you don't overwhelm the machine running your app, I'd make sure this is a problem first. The threadpool is supposed to manage this for you. On the other hand, if you don't want to overwhelm some other resource, then read on!
You can't manage the size of the threadpool (or really much of anything about it).
In this case, I'd use a semaphore to manage access to your resource. In your case, your resource is running the web scrape, or calculating some report, etc.
To do this, in your static class, create a semaphore object:
System.Threading.Semaphore S = new System.Threading.Semaphore(3, 3);
Then, in each thread, you do this:
System.Threading.Semaphore S = new System.Threading.Semaphore(3, 3);
try
{
// wait your turn (decrement)
S.WaitOne();
// do your thing
}
finally {
// release so others can go (increment)
S.Release();
}
Each thread will block on the S.WaitOne() until it is given the signal to proceed. Once S has been decremented 3 times, all threads will block until one of them increments the counter.
This solution isn't perfect.
If you want something a little cleaner, and more efficient, I'd recommend going with a BlockingQueue approach wherein you enqueue the work you want performed into a global Blocking Queue object.
Meanwhile, you have three threads (which you created--not in the threadpool), popping work out of the queue to perform. This isn't that tricky to setup and is very fast and simple.
Examples:
Best threading queue example / best practice
Best method to get objects from a BlockingQueue in a concurrent program?
It's a static class like any other, which means that anything you do with it affects every other thread in the current process. It doesn't affect other processes.
I consider this one of the larger design flaws in .NET, however. Who came up with the brilliant idea of making the thread pool static? As your example shows, we often want a thread pool dedicated to our task, without having it interfere with unrelated tasks elsewhere in the system.

Categories

Resources