Semaphore - What is the use of initial count? - c#

http://msdn.microsoft.com/en-us/library/system.threading.semaphoreslim.aspx
To create a semaphore, I need to provide an initial count and maximum count. MSDN states that an initial count is -
The initial number of requests for the
semaphore that can be granted
concurrently.
While it states that maximum count is
The maximum number of requests for the
semaphore that can be granted
concurrently.
I can understand that the maximum count is the maximum number of threads that can access a resource concurrently, but what is the use of initial count?
If I create a semaphore with an initial count of 0 and a maximum count of 2, none of my threadpool threads are able to access the resource. If I set the initial count as 1 and maximum count as 2 then only one thread pool thread can access the resource. It is only when I set both initial count and maximum count as 2, 2 threads are able to access the resource concurrently. So, I am really confused about the significance of initial count?
SemaphoreSlim semaphoreSlim = new SemaphoreSlim(0, 2); //all threadpool threads wait
SemaphoreSlim semaphoreSlim = new SemaphoreSlim(1, 2);//only one thread has access to the resource at a time
SemaphoreSlim semaphoreSlim = new SemaphoreSlim(2, 2);//two threadpool threads can access the resource concurrently

So, I am really confused about the significance of initial count?
One important point that may help here is that Wait decrements the semaphore count and Release increments it.
initialCount is the number of resource accesses that will be allowed immediately. Or, in other words, it is the number of times Wait can be called without blocking immediately after the semaphore was instantiated.
maximumCount is the highest count the semaphore can obtain. It is the number of times Release can be called without throwing an exception assuming initialCount count was zero. If initialCount is set to the same value as maximumCount then calling Release immediately after the semaphore was instantiated will throw an exception.

Yes, when the initial number sets to 0 - all threads will be waiting while you increment the "CurrentCount" property. You can do it with Release() or Release(Int32).
Release(...) - will increment the semaphore counter
Wait(...) - will decrement it
You can't increment the counter ("CurrentCount" property) greater than maximum count which you set in initialization.
For example:
SemaphoreSlim^ s = gcnew SemaphoreSlim(0,2); //s->CurrentCount = 0
s->Release(2); //s->CurrentCount = 2
...
s->Wait(); //Ok. s->CurrentCount = 1
...
s->Wait(); //Ok. s->CurrentCount = 0
...
s->Wait(); //Will be blocked until any of the threads calls Release()

How many threads do you want to be able to access resource at once? Set your initial count to that number. If that number is never going to increase throughout the life of the program, set your max count to that number too. That way, if you have a programming error in how you release the resource, your program will crash and let you know.
(There are two constructors: one that takes only an initial value, and one that additionally takes the max count. Use whichever is appropriate.)

Normally, when the SemaphoreSlim is used as a throttler, both initialCount and maxCount have the same value:
var semaphore = new SemaphoreSlim(maximumConcurrency, maximumConcurrency);
...and the semaphore is used with this pattern:
await semaphore.WaitAsync(); // or semaphore.Wait();
try
{
// Invoke the operation that must be throttled
}
finally
{
semaphore.Release();
}
The initialCount configures the maximum concurrency policy, and the maxCount ensures that this policy will not be violated. If you omit the second argument (the maxCount) your code will work just as well, provided that there are no bugs in it. If there is a bug, and each WaitAsync could be followed by more than one Release, then the maxCount will help at detecting this bug before it ends up in the released version of your program. The bug will be surfaced as a SemaphoreFullException, hopefully during the testing of a pre-release version, and so you'll be able to track and eliminate it before it does any real harm (before it has caused the violation of the maximum concurrency policy in a production environment).
The default value of the maxCount argument, in case you omit it, is Int32.MaxValue (source code).

If you wish that no thread should access your resource for some time, you pass the initial count as 0 and when you wish to grant the access to all of them just after creating the semaphore, you pass the value of initial count equal to maximum count. For example:
hSemaphore = CreateSemaphoreA(NULL, 0, MAX_COUNT, NULL) ;
//Do something here
//No threads can access your resource
ReleaseSemaphore(hSemaphore, MAX_COUNT, 0) ;
//All threads can access the resource now
As quoted in MSDN Documentation- "Another use of ReleaseSemaphore is during an application's initialization. The application can create a semaphore with an initial count of zero. This sets the semaphore's state to nonsignaled and blocks all threads from accessing the protected resource. When the application finishes its initialization, it uses ReleaseSemaphore to increase the count to its maximum value, to permit normal access to the protected resource."

This way when the current thread creates the semaphore it could claim some resources from the start.

Think of it like this:
initialCount is the "degree of parallelism" (number of threads that can enter)
maxCount ensures that you don't Release more than you should
For example, say you want a concurrency degree of "1" (only one operation at a time). But then due to some bug in your code, you release the semaphore twice. So now you have a concurrency of two!
But if you set maxCount - it will not allow this and throw an exception.

maxCount is the number of concurrent threads that you're going to be allowing.
However, when you start the throttling, you may already know there are a few active threads, so you'd want to tell it "hey, I want to have 6 concurrent threads, but I already have 4, so I want you to only allow 2 more for now", so you'd set initialCount to 2 and maxCount to 6.
The limitation with initialCount in SemaphoreSlim is that it cannot be a negative number, so you can't say "hey, I want to have up to 6 concurrent threads, but I currently have 10, so let 5 get released before you allow another one in.". That would mean an initialCount of -4. For that you'd need to use a 3rd party package like SemaphoreSlimThrottling (note that I am the author of SemaphoreSlimThrottling).

As MSDN explains it under the Remarks section:
If initialCount is less than maximumCount, the effect is the same as if the current thread had called WaitOne (maximumCount minus initialCount) times. If you do not want to reserve any entries for the thread that creates the semaphore, use the same number for maximumCount and initialCount.
So If the initial count is 0 and max is 2 it is as if WaitOne has been called twice by the main thread so we have reached capacity (semaphore count is 0 now) and no thread can enter Semaphore. Similarly If initial count is 1 and max is 2 WaitOnce has been called once and only one thread can enter before we reach capacity again and so on.
If 0 is used for initial count we can always call Release(2) to increase the semaphore count to max to allow maximum number of threads to acquire resource.

Semaphores can be used to protect a pool of resources. We use resource pools to reuse things that are expensive to create - such as database connections.
So initial count refers to the number of available resources in the pool at the
start of some process. When you read the initialCount in code you should be thinking in terms of how much up front effort are you putting into creating this pool of resources.
I am really confused about the significance of initial count?
Initial count = Upfront cost
As such, depending on the usage profile of your application, this value can have a dramatic effect on the performance of your application. It's not just some arbitrary number.
You should think carefully about what you creating, how expensive they are to create and how many you need right away. You should literally able able to graph the optimal value for this parameter and should likely think about making it configurable so you can adapt the performance of the process to the time at which it is being executed.

Related

Interlocked.Increment Method by a Certain Number Interval

We have a concurrent, multithreaded program.
How would I make a sample number increase by +5 interval every time? Does Interlocked.Increment, have an overload for interval? I don't see it listed.
Microsoft Interlocked.Increment Method
// Attempt to make it increase by 5
private int NumberTest;
for (int i = 1; i <= 5; i++)
{
NumberTest= Interlocked.Increment(ref NumberTest);
}
This is another question its based off,
C# Creating global number which increase by 1
I think you want Interlocked.Add:
Adds two integers and replaces the first integer with the sum, as an atomic operation.
int num = 0;
Interlocked.Add(ref num, 5);
Console.WriteLine(num);
Adding (i.e +=) is not and cannot be an atomic operation (as you know). Unfortunately, there is no way to achieve this without enforcing a full fence, on the bright-side these are fairly optimised at a low level. However, there are several other ways you can ensure integrity though (especially since this is just an add)
The use of Interlocked.Add (The sanest solution)
Apply exclusive lock (or Moniter.Enter) outside the for loop.
AutoResetEvent to ensure threads doing task one by one (meh sigh).
Create a temp int in each thread and once finished add temp onto the sum under an exclusive lock or similar.
The use of ReaderWriterLockSlim.
Parallel.For thread based accumulation with Interlocked.Increment sum, same as 4.

Use and reset an int that is constantly incremented by another thread

I got a Task that counts the number of packets it receives from some source.
Every 250ms a timer fires up reads and outputs the count to the user. Right after i need to set the count back to 0.
My concern is that between reading and displaying the count, but BEFORE I set count=0, count has incremented in the other thread, so i end up losing counts by zeroing it out.
I am new to Threading so i have been at multiple options.
I looked into using Interlocked but as far as i know it only gives me arithmetic operations, i don't have the option to actually set the variable to value.
I was also looking into the ReaderWriterLockSlim, what i need is the most efficient / less overhead way to accomplish since there is lot of data coming across.
You want Exchange:
int currentCount = System.Threading.Interlocked.Exchange(ref count, 0)
As per the docs:
Sets a 32-bit signed integer to a specified value and returns the original value, as an atomic operation.

Different thread affinities in Parallel.For Iterations

I need an iteration of a parallel for loop to use 7 cores(or stay away from 1 core) but another iteration to use 8(all) cores and tried below code:
Parallel.For(0,2,i=>{
if(i=0)
Process.GetCurrentProcess().ProcessorAffinity = (System.IntPtr)(255);
if(i==1)
Process.GetCurrentProcess().ProcessorAffinity = (System.IntPtr)(254);
Thread.Sleep(25);// to make sure both set their affinities
Console.WriteLine(Process.GetCurrentProcess().ProcessorAffinity);
});
this outputs 255 for both iterations. So either parallel.for loop is using single thread for them or one setting sets other iterations affinity too. Another problem is, this is from a latency sensitive application and all this affinity settings add 1 to 15 milliseconds latency.
Do I have to use threads explicitly and should I set affinities only once?
Edit: I tried threaded version, same thing happens. Even with explicit two threads, both writing 255 to console. Now it seems this command is for a process not a thread.
OpenCL context is using max cores for kernel execution on cpu in one iteration. Other iterations using 1-2 cores to copy buffers and send command to devices. When cpu is used by opencl, it uses all cores and devices cannot get enough time to copy buffers. Device fission seems to be harder than solving this issue I hink.
Different thread affinities in Parallel.For Iterations
Question is misleading, as it is based on assumption that Parallel API means multiple threads. Parallel API does refer to Data Parallel processing, but doesn't provide any guarantee for invoking multiple threads, especially for the code provided above, where there's hardly any work per thread.
For the Parallel API, you can set the Max degree of Parallelism, as follows:
ParallelOptions parallelOption = new ParallelOptions();
parallelOption.MaxDegreeOfParallelism = Environment.ProcessorCount;
Parallel.For(0, 20, parallelOption, i =>
But that never guarantee the number of threads that would be invoked to parallel processing, since Threads are used from the ThreadPool and CLR decides at run-time, based on amount of work to be processed, whether more than one thread is required for the processing.
In the same Parallel loop can you try the following, print Thread.Current.ManageThreadId, this would provide a clear idea, regarding the number of threads being invoked in the Parallel loop.
Do I have to use threads explicitly and should I set affinities only once?
Edit: I tried threaded version, same thing happens. Even with explicit two threads, both writing 255 to console. Now it seems this command is for a process not a thread.
Can you post the code, for multiple threads, can you try something like this.
Thread[] threadArray = new Thread[2];
threadArray[0] = new Thread(<ThreadDelegate>);
threadArray[1] = new Thread(<ThreadDelegate>);
threadArray[0]. ProcessorAffinity = <Set Processor Affinity>
threadArray[1]. ProcessorAffinity = <Set Processor Affinity>
Assuming you assign the affinity correctly, you can print them and find different values, check the following ProcessThread.ProcessorAffinity.
On another note as you could see in the link above, you can set the value in hexadecimal based on processor affinity, not sure what does values 254, 255 denotes , do you really have server with that many processors.
EDIT:
Try the following edit to your program, (based on the fact that two Thread ids are getting printed), now by the time both threads some in the picture, they both get same value of variable i, they need a local variable to avoid closure issue
Parallel.For(0,2,i=>{
int local = i;
if(local=0)
Process.GetCurrentProcess().ProcessorAffinity = (System.IntPtr)(255);
if(local==1)
Process.GetCurrentProcess().ProcessorAffinity = (System.IntPtr)(254);
Thread.Sleep(25);// to make sure both set their affinities
Console.WriteLine(Process.GetCurrentProcess().ProcessorAffinity);
});
EDIT 2: (Would mostly not work as both threads might increment, before actual logic execution)
int local = -1;
Parallel.For(0,2,i=>{
Interlocked.Increment(ref local);
if(local=0)
Process.GetCurrentProcess().ProcessorAffinity = (System.IntPtr)(255);
if(local==1)
Process.GetCurrentProcess().ProcessorAffinity = (System.IntPtr)(254);
Thread.Sleep(25);// to make sure both set their affinities
Console.WriteLine(Process.GetCurrentProcess().ProcessorAffinity);
});

Increase Number of running thread in Parallel.For

I have just did a sample for multithreading using This Link like below:
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
int count = 0;
Parallel.For(0, 50000, options,(i, state) =>
{
count++;
});
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
Console.ReadKey();
It gives me 15 thread before Parellel.For and after it gives me 17 thread only. So only 2 thread is occupy with Parellel.For.
Then I have created a another sample code using This Link like below:
var options = new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount * 10 };
Console.WriteLine("MaxDegreeOfParallelism : {0}", Environment.ProcessorCount * 10);
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
int count = 0;
Parallel.For(0, 50000, options,(i, state) =>
{
count++;
});
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
Console.ReadKey();
In above code, I have set MaxDegreeOfParallelism where it sets 40 but is still taking same threads for Parallel.For.
So how can I increase running thread for Parallel.For?
I am facing a problem that some numbers is skipped inside the Parallel.For when I perform some heavy and complex functionality inside it. So here I want to increase the maximum thread and override the skipping issue.
What you're saying is something like: "My car is shaking when driving too fast. I'm trying to avoid this by driving even faster." That doesn't make any sense. What you need is to fix the car, not change the speed.
How exactly to do that depends on what are you actually doing in the loop. The code you showed is obviously placeholder, but even that's wrong. So I think what you should do first is to learn about thread safety.
Using a lock is one option, and it's the easiest one to get correct. But it's also hard to make it efficient. What you need is to lock only for a short amount of time each iteration.
There are other options how to achieve thread safety, including using Interlocked, overloads of Parallel.For that use thread-local data and approaches other than Parallel.For(), like PLINQ or TPL Dataflow.
After you made sure your code is thread safe, only then it's time to worry about things like the number of threads. And regarding that, I think there are two things to note:
For CPU-bound computations, it doesn't make sense to use more threads than the number of cores your CPU has. Using more threads than that will actually usually lead to slower code, since switching between threads has some overhead.
I don't think you can measure the number of threads used by Parallel.For() like that. Parallel.For() uses the thread pool and it's quite possible that there already are some threads in the pool before the loop begins.
Parallel loops use hardware CPU cores. If your CPU has 2 cores, this is the maximum degree of paralellism that you can get in your machine.
Taken from MSDN:
What to Expect
By default, the degree of parallelism (that is, how many iterations run at the same time in hardware) depends on the
number of available cores. In typical scenarios, the more cores you
have, the faster your loop executes, until you reach the point of
diminishing returns that Amdahl's Law predicts. How much faster
depends on the kind of work your loop does.
Further reading:
Threading vs Parallelism, how do they differ?
Threading vs. Parallel Processing
Parallel loops will give you wrong result for summation operations without locks as result of each iteration depends on a single variable 'Count' and value of 'Count' in parallel loop is not predictable. However, using locks in parallel loops do not achieve actual parallelism. so, u should try something else for testing parallel loop instead of summation.

Thread synchronization. Why exactly this lock isn't enough to synchronize threads [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Threads synchronization. How exactly lock makes access to memory 'correct'?
This question is inspired by this one.
We got a following test class
class Test
{
private static object ms_Lock=new object();
private static int ms_Sum = 0;
public static void Main ()
{
Parallel.Invoke(HalfJob, HalfJob);
Console.WriteLine(ms_Sum);
Console.ReadLine();
}
private static void HalfJob()
{
for (int i = 0; i < 50000000; i++) {
lock(ms_Lock) { }// empty lock
ms_Sum += 1;
}
}
}
Actual result is very close to expected value 100 000 000 (50 000 000 x 2, since 2 loops are running at the same time), with around 600 - 200 difference (mistake is approx 0.0004% on my machine which is very low). No other way of synchronization can provide such way of approximation (its either a much bigger mistake % or its 100% correct)
We currently understand that such level of preciseness is because of program runs in the following way:
Time is running left to right, and 2 threads are represented by two rows.
where
black box represents process of acquiring, holding and releasing the
lock plus represents addition operation ( schema represents scale on
my PC, lock takes approximated 20 times longer than add)
white box represents period that consists of try to acquire lock,
and further awaiting for it to become available
Also lock provides full memory fence.
So the question now is: if above schema represents what is going on, what is the cause of such big error (now its big cause schema looks like very strong syncrhonization schema)? We could understand difference between 1-10 on boundaries, but its clearly no the only reason of error? We cannot see when writes to ms_Sum can happen at the same time, to cause the error.
EDIT: many people like to jump to quick conclusions. I know what synchronization is, and that above construct is not a real or close to good way to synchronize threads if we need correct result. Have some faith in poster or maybe read linked answer first. I don't need a way to synchronize 2 threads to perform additions in parallel, I am exploring this extravagant and yet efficient , compared to any possible and approximate alternative, synchronization construct (it does synchronize to some extent so its not meaningless like suggested)
lock(ms_Lock) { } this is meaningless construct. lock gurantees exclusive execution of code inside it.
Let me explain why this empty lock decreases (but doesn't eliminate!) the chance of data corruption. Let's simplify threading model a bit:
Thread executes one line of code at time slice.
Thread scheduling is done in strict round robin manner (A-B-A-B).
Monitor.Enter/Exit takes significantly longer to execute than arithmetics. (Let's say 3 times longer. I stuffed code with Nops that mean that previous line is still executing.)
Real += takes 3 steps. I broke them down to atomic ones.
At left column shown which line is executed at the time slice of threads (A and B).
At right column - the program (according to my model).
A B
1 1 SomeOperation();
1 2 SomeOperation();
2 3 Monitor.Enter(ms_Lock);
2 4 Nop();
3 5 Nop();
4 6 Monitor.Exit(ms_Lock);
5 7 Nop();
7 8 Nop();
8 9 int temp = ms_Sum;
3 10 temp++;
9 11 ms_Sum = temp;
4
10
5
11
A B
1 1 SomeOperation();
1 2 SomeOperation();
2 3 int temp = ms_Sum;
2 4 temp++;
3 5 ms_Sum = temp;
3
4
4
5
5
As you see in first scenario thread B just can't catch thread A and A has enough time to finish execution of ms_Sum += 1;. In second scenario the ms_Sum += 1; is interleaved and causes constant data corruption. In reality thread scheduling is stochastic, but it means that thread A has more change to finish increment before another thread gets there.
This is a very tight loop with not much going on inside it, so ms_Sum += 1 has a reasonable chance of being executed in "just the wrong moment" by the parallel threads.
Why would you ever write a code like this in practice?
Why not:
lock(ms_Lock) {
ms_Sum += 1;
}
or just:
Interlocked.Increment(ms_Sum);
?
-- EDIT ---
Some comments on why would you see the error despite memory barrier aspect of the lock... Imagine the following scenario:
Thread A enters the lock, leaves the lock and then is pre-empted by the OS scheduler.
Thread B enters and leaves the lock (possibly once, possibly more than once, possibly millions of times).
At that point the thread A is scheduled again.
Both A and B hit the ms_Sum += 1 at the same time, resulting in some increments being lost (because increment = load + add + store).
As noted the statement
lock(ms_Lock) {}
will cause a full memory barrier. In short this means the value of ms_Sum will be flushed between all caches and updated ("visible") among all threads.
However, ms_Sum += 1 is still not atomic as it is just short-hand for ms_Sum = ms_Sum + 1: a read, an operation, and an assignment. It is in this construct that there is still a race condition -- the count of ms_Sum could be slightly lower than expected. I would also expect that the difference to be more without the memory barrier.
Here is a hypothetical situation of why it might be lower (A and B represent threads and a and b represent thread-local registers):
A: read ms_Sum -> a
B: read ms_Sum -> b
A: write a + 1 -> ms_Sum
B: write b + 1 -> ms_Sum // change from A "discarded"
This depends on a very particular order of interleaving and is dependent upon factors such as thread execution granularity and relative time spent in said non-atomic region. I suspect the lock itself will reduce (but not eliminate) the chance of the interleave above because each thread must wait-in-turn to get through it. The relative time spent in the lock itself to the increment may also play a factor.
Happy coding.
As others have noted, use the critical region established by the lock or one of the atomic increments provided to make it truly thread-safe.
As noted: lock(ms_Lock) { } locks an empty block and so does nothing. You still have a race condition with ms_Sum += 1;. You need:
lock( ms_Lock )
{
ms_Sum += 1 ;
}
[Edited to note:]
Unless you properly serialize access to ms_Sum, you have a race condition. Your code, as written does the following (assuming the optimizer doesn't just throw away the useless lock statement:
Acquire lock
Release lock
Get value of ms_Sum
Increment value of ms_Sum
Store value of ms_Sum
Each thread may be suspended at any point, even in mid-instruction. Unless it is specifically documented as being atomic, it's a fair bet that any machine instruction that takes more than 1 clock cycle to execute may be interrupted mid-execution.
So let's assume that your lock is actually serializing the two threads. There is still nothing in place to prevent one thread from getting suspended (and thus giving priority to the other) whilst it is somewhere in the middle of executing the last three steps.
So the first thread in, locks, release, gets the value of ms_Sum and is then suspended. The second thread comes in, locks, releases, gets the [same] value of ms_Sum, increments it and stores the new value back in ms_Sum, then gets suspended. The 1st thread increments its now-outdates value and stores it.
There's your race condition.
The += operator is not atomic, that is, first it reads then it writes the new value. In the mean time, between reading and writing, the thread A could switch to the other B, in fact without writing the value... then the other thread B don't see the new value because it was not assigned by the other thread A... and when returning to the thread A it will discard all the work of the thread B.

Categories

Resources