Threading: Locking Under the hood of - c#

What happens when we use the lock object? I am aware that it the runtime makes use of the monitor.Enter and Exit methods. But what really happens under the hood?
Why only reference types to be used for locking?
Even though the object used for accomplishing the locking is changed, how come it still provides thread safety?
In the current sample, we are modifying the object which is used for the locking purpose. Ideally, this is not a preferred way of doing it and best practise is to used a dedicated privately scoped variable.
static List<string> stringList = new List<string>();
static void AddItems(object o)
{
for (int i = 0; i < 100; i++)
{
lock (stringList)
{
Thread.Sleep(20);
stringList.Add(string.Format("Thread-{0},No-{1}", Thread.CurrentThread.ManagedThreadId, i));
}
}
string[] listArray = null;
lock(stringList)
listArray = stringList.ToArray();
foreach (string s in listArray)
{
Console.WriteLine(s);
}
}

What happens under the hood is approximately this:
Imagine the object type has a hidden field in it.
Monitor.Enter() and Monitor.Exit() use that field to communicate with each other.
Every reference type inherits that field from object.
Of course, the type of that field is something special: It’s a synchronization lock that works in a thread-safe manner. In reality, of course, it is not really a field in the CLR sense, but a special feature of the CLR that uses a chunk of memory within each object’s memory to implement that synchronization lock. (The exact implementation is described in “Safe Thread Synchronization” in the MSDN Magazine.)
How come it still provides thread safety? I think what you mean is: why doesn’t it break thread safety for objects that are thread safe? The answer is easy: because you can have objects that are partly thread safe and partly not. You could have an object with two methods, and using one of them is thread safe while the other isn’t. Monitor.Enter() is thread safe irrespective of what the rest of the object does.
Why only reference types to be used for locking? Because only reference types actually have this special magic in their memory chunk. Value types are really literally just the value itself: a 32-bit integer in the case of int; a concatenation of all the fields in the case of a custom struct. You can pass a value type into Monitor.Enter(), and it won’t complain, but it won’t work because the value type will be boxed — i.e., wrapped into an object of a reference type. When you call Monitor.Exit(), it will be boxed again, so it will try to release the lock on a different object reference.
Regarding your code sample: I see nothing wrong with it. All your access to the stringList variable is wrapped in a lock, and you never assign to the stringList field itself except during initialisation. There is nothing that can go wrong with this; it is thread safe. (Of course something could go wrong if some other code accesses the field without locking it. If you were to make the field public, there is a very great chance of that happening accidentally. There is no need to use only locally-scoped variables for such a lock unless you really can’t ensure otherwise that the variable won’t be accessed by code you don’t control.)

But what really happens under the
hood?
Refer to this MSDN article for an in-depth description.
Essentially, each CLR object that gets allocated has an associated field that holds a sync block index. This index points into a pool of sync blocks that the CLR maintains. A sync block holds the same information as a critical section which gets used during synchronization. Initially, an object's sync block index is meaningless (uninitialized). When you lock on the object, however, it gets a valid index into the pool.
Why only reference types to be used
for locking?
You need a reference type since value types don't have the associated sync block index field (less overhead).
Even though the object used for
accomplishing the locking is changed,
how come it still provides thread
safety?
Locking on a CLR object and then modifying it is akin to having a C++ object with a CRITICAL_SECTION member that's used for locking while that same object is modified. There are no thread safety issues there.
In the current sample, we are
modifying the object which is used for
the locking purpose. Ideally, this is
not a preferred way of doing it and
best practise is to used a dedicated
privately scoped variable.
Correct, this situation is also described in the article. If you're not using a privately scoped variable that is in complete control of the owning class, then you can run into deadlock issues when two separate classes decide to lock on the same referenced object (e.g. if for some reason your stringList gets passed to another class that then decides to lock on it as well). This is unlikely, but if you use a strictly-controlled, privately scoped variable that never gets passed around, you will avoid such deadlocks altogether.

Related

What is the importance of clearing the collection before disposing

What is the difference between the following codes
Code 1 :
if(RecordCollections!=null){
RecordCollections.Clear();
RecordCollections=null;
}
Code 2 :
RecordCollections = null;
The code is present inside a Dispose Method , Is there any advantage over using the Clear method before making the Collection to null ? Whether it is needed at all?
Is there any advantage over using the Clear method before making the Collection to null.
Impossible to say without a good Minimal, Complete, and Verifiable code example.
That said, neither of those code snippets look very useful to me. The first one for sure would be pointless if all that the Clear() method does is to empty the collection. Of course, if it actually went through and e.g. called Dispose() on each collection member, that would be different. But that would be a very unusual collection implementation.
Even the second has very little value, and is contrary to normal IDisposable semantics. IDisposable is supposed to be just for managing unmanaged resources. It gets used sometimes for other things, but that's its main purpose. In particular, one typically only calls Dispose() just before discarding an object. If the object itself is going to be discarded, then any references it holds to other objects (such as a collection) will no longer be reachable, and so setting them to null doesn't have any useful effect.
In fact, in some cases setting a variable to null can actually extend the lifetime of an object. The runtime is sophisticated enough to recognize that a variable is no longer used, and if it holds the last remaining reference to an object, the object can become eligible for garbage collection at that point, even if the scope of the variable extends further. By setting the variable to null, the variable itself is used later in the program, and so the runtime can't treat it as unreachable until that point, later than it otherwise would have.
This last point typically applies most commonly to local variables, not fields in an object. But it's theoretically possible for the runtime to optimize more broadly. It's a bad habit to get into, to go around setting to null variables that themselves aren't going to be around much longer.
Dispose refers to a mechanism to explicitly clean up the un-managed memory, since that cannot be cleaned up using standard Garbage Collector, mostly IDisposable will be implemented by the class, which use the unmanaged API internally like Database Connection.
Standard practice is:
To also implement Finalizer along with, since Dispose is an explicit call and if missed out by caller then Finalization does take care of clean up action, though it needs 2 cycles of GC.
If a class use any of the object, as a class variable, which implements a Dispose itself, then the Dispose shall be implemented to call the Dispose on the class variable.
Regarding the code provided:
if(RecordCollections!=null){
RecordCollections.Clear();
RecordCollections=null;
}
Or
RecordCollections = null;
As this is related to cleaning up managed memory, it has little use, since GC does the main job and doesn't need it, but in my view its an acceptable practice, where class variables are explicitly nullified, which makes user vary of each and every allocation and mostly an attempt shall be made to use the method local variables, until and unless state needs to be shared across method calls. Object allocation misuse, can be much more controlled.
As far as difference is concerned, though a collection is explicitly cleared and then nullified or just nullified, the memory remains intact, till the point GC is invoked, which is un-deterministic, but in my view its not very clear, how does the GC explicitly maps the objects for collection, which are no more reachable, but for various generations, especially the higher ones (promoted objects), if an object is explicit marked as null, then GC may have to spend less / no time tracing the root / reference, however there's no explicit documentation, to explain this aspect / implementation.

Volatile equivalent for non primitive objects c#

I think i'm missing something big here.
What i'm trying to do:
I have an object, which is known to multiple threads, which may read or manipulate it. Now i want the object accesses to block, when one thread calls obj.setProperty(T type) i want every other thread to have to wait until the property is set. How to i do this? I know that there is volatile for primitive types, but how does this translate to non primitive types.
Use the lock statement in the property getter and setter.
Also, you don't understand what volatile does. Volatile is to prevent blocking, not to cause blocking.

Volatile vs VolatileRead/Write?

I can't find any example of VolatileRead/write (try...) but still:
When should I use volatile vs VolatileRead?
AFAIK the whole purpose of volatile is to create half fences so:
For a READ operation, reads/writes (on other threads) which comes AFTER the current operation , won't pass before the fence. hence - we read the latest value.
Question #1
So why do I need the volatileRead? it seems that volatile already do the work.
Plus - in C# all writes are volatile (unlike say in Java), regardless of whether you write to a volatile or a non-volatile field - and so I ask: Why do I need the volatileWrite?
Question #2
This is the implementation for VolatileRead :
[MethodImpl(MethodImplOptions.NoInlining)]
public static int VolatileRead(ref int address)
{
int num = address;
MemoryBarrier();
return num;
}
Why the line int num = address; is there? they already have the address argument which is clearly holding the value.
You should never use Thread.VolatileRead/Write(). It was a design mistake in .NET 1.1, it uses a full memory barrier. This was corrected in .NET 2.0, but they couldn't fix these methods anymore and had to add a new way to do it, provided by the System.Threading.Volatile class. Which is a class that the jitter is aware of, it replaces the methods at jit time with a version that's suitable for the specific processor type.
The comments in the source code for the Volatile class as available through the Reference Source tells the tale (edited to fit):
// Methods for accessing memory with volatile semantics. These are preferred over
// Thread.VolatileRead and Thread.VolatileWrite, as these are implemented more
// efficiently.
//
// (We cannot change the implementations of Thread.VolatileRead/VolatileWrite
// without breaking code that relies on their overly-strong ordering guarantees.)
//
// The actual implementations of these methods are typically supplied by the VM at
// JIT-time, because C# does not allow us to express a volatile read/write from/to
// a byref arg. See getILIntrinsicImplementationForVolatile() in jitinterface.cpp.
And yes, you'll have trouble finding examples of its usage. The Reference Source is an excellent guide with megabytes of carefully written, tested and battle-scarred C# code that deals with threading. The number of times it uses VolatileRead/Write: zero.
Frankly, the .NET memory models are a mess with conflicting assumptions made by the CLR mm and C# mm with new rules added for ARM cores just recently. The weirdo semantics of the volatile keyword that means different things for different architectures is some evidence for that. Albeit that for a processor with a weak memory model you can typically assume that what the C# language spec says is accurate.
Do note that Joe Duffy has given up all hope and just flat out discourages all use of it. It is in general very unwise to assume that you can do better than the primitives provided by the language and framework. The Remarks section of the Volatile class bring the point home:
Under normal circumstances, the C# lock statement, the Visual Basic SyncLock statement, and the Monitor class provide the easiest and least error-prone way of synchronizing access to data, and the Lazy class provides a simple way to write lazy initialization code without directly using double-checked locking.
When you need more fine grained control over the way fences are applied to the code can you can use the static Thread.VolatileRead or Thread.VolatileWrite .
Declaring a variable volatile means that the compiler doesn't cache it's value and always reads the field value, and when a write is performed the compiler writes assigned values immediately.
The two methods Thread.VolatileRead and Thread.VolatileWrite gives you the ability to have a finer grained control without declaring the variable as volatile, since you can decide when perform a volatile read operation and when a volatile write, without having the bound to read no cached and write immediately that you have when you declare the variale volatile, so in poor words you have more control and more freedom ...
VolatileRead() reads the latest version of a memory address, and VolatileWrite() writes to the address, making the address available to all threads. Using both VolatileRead() and VolatileWrite() consistently on a variable has the same effect as marking it as volatile.
Take a look at this blog post that explains by example the difference ...
Why the line int num = address; is there ? they already have the
address argument which is clearly holding the value.
It is a defensive copy to avoid that something outside change the value while we are inside the method, the integer value is copied to the local variable to avoid an accidental change from outside.
Notes
Since in Visual Basic the volatile keyword doesn't exist you have the only choice of using consistently VolatileRead() and VolatileWrite() static methods to achieve the same effect of the volatile keyword in c#.
Why the line int num = address; is there ? they already have the
address argument which is clearly holding the value.
address is not an int. It is an int* (so it really is an address). The code is dereferencing the pointer and copying it to a local so that the barrier occurs after the dereference.
To Elaborate more on aleroot's answer.
Volatile.Read and Volatile.Write are same as volatile modifier as Royi Namir's argument. But you can use them wisely.
For example if you declare a field with volatile modifier then every access to this field whether it is read or write operation it will be read from CPU Register which is not free, this is not required in most of cases and will be unnecessary performance hit if field is even have many read operation.
Think of scenario where you have private singleton variable is declared as volatile and is returned in property getter, once it is initialized you don't need to read it's root from CPU Register hence you can use Volatile.Read / Write until it's instance created, once created all read operation can be done as normal field otherwise it would be big performance hit.
Whereas you can use Volatile.Read or Volatile.Write on demand bases.
Best uses is declare field without volatile modifier and use Volatile.Read or Volatile.Write when needed.

How does reassigning a disposable object variable work?

In C# when reassigning a disposable object variable with a new object, how does it work in the memory? Will the memory space occupied by the old object simply be overwritten by the new object? Or do I still have to call Dispose() to release the resources it uses?
DisposableThing thing;
thing = new DisposableThing();
//....do stuff
//thing.Dispose();
thing = new DisposableThing();
In this case you have one slot / reference and two instances of an IDisposable object. Both of these instances must be disposed indepedently. The compiler doesn't insert any magic for IDisposable. It will simply change the instance the reference points to
A good pattern would be the following
using (thing = new DisposableThing()) {
// ... do stuff
}
thing = new DisposableThing();
Ideally the second use of thing should also be done within a using block
It works the same as any other assignment: = does not care about IDisposable or do any magic.
The object initially assigned to the variable will have to have Dispose invoked manually (or better yet, with using) as required. However, it may not always be correct to call Dispose at this point: who owns the object and controls the lifetime?
Will the memory space occupied by the old object simply be overwritten by the new object?
Does not apply. Variables "name" objects. An object is itself and a variable is a variable - not the object or "location in memory". (See Eric Lippert's comment bellow: the preceding is a high-level view of variables while Eric's comment reflects variables more precisely in C# implementation, spirit, and terminology.)
Variables only affect object lifetimes insomuch as they can* keep an object from being reclaimed (and thus prevent the finalizer from [possibly eventually] running). However, variables do not control the objects semantic-lifetime -- an object may be Disposed even when not reclaimable -- or eliminate the need to invoke Dispose as required.
Happy coding.
When dealing with disposable objects that extend beyond simple scopes -- the objects may be assigned to many different variables during their lifetime! -- I find it best to define who "takes control" which I annotate in the documentation. If the object lifetime is nicely limited to a scope then using works well.
*A local variable itself is not necessarily sufficient to keep an object strongly reachable as it is possible that the variable/assignment is aggressively optimized out if not used later from the same scope. This may plague objects like timers.
It is possible, and common, for multiple references to the same IDisposable object to exist. There is nothing wrong with overwriting or destroying all but one of those references, provided that one reference will have Dispose called on it before it is destroyed.
Note that the purpose of IDisposable.Dispose isn't actually to destroy an object, but rather to let an object which has previously asked other entities to do something on its behalf (e.g. reserve a GDI handle, grant exclusive use of a serial port, etc.), notify those entities that they no longer need to keep doing so. If nothing tells the outside entities that they no longer need to keep doing something on behalf of the IDisposable object, they'll keep doing it--even if the object they're doing it for no longer exists.

Is the following code a rock solid way of creating objects from a singleton?

I've stumbled accross this code in production and I think It may be causing us problems.
internal static readonly MyObject Instance = new MyObject();
Calling the Instance field twice returns two objects with the same hash code. Is it possible that these objects are different?
My knowledge of the CLI says that they are the same because the hash codes are the same.
Can anyone clarify please?
The field will only be initialized once, so you'll always get the same object. It's perfectly safe.
Of course, you have to be careful when using static objects from multiple threads. If the object is not thread-safe, you should lock it before accessing it from different threads.
Yes it is safe - the simplest safe singleton implementation.
As a further point on comparing the hash-code to infer "they're the same object"; since we're talking about reference-types here (singleton being meaningless for value-types), the best way to check if two references point to the same object is:
bool isSame = ReferenceEqual(first, second);
which isn't dependent on the GetHashCode()/Equals/== implementations (it looks at the reference itself).
It is a guarantee provided by the CLR that this will work properly, even when the class is used by multiple threads. This is specified in Ecma 335, Partition II, section 10.5.3.3:
There are similar, but more complex, problems when type initialization takes place in a multi-threaded system. In these cases, for example, two separate threads might start attempting to access static variables of separate
types (A and B) and then each would have to wait for the other to complete initialization.
A rough outline of an algorithm to ensure points 1 and 2 above is as follows:
1. At class load-time (hence prior to initialization time) store zero or null into all static fields of the type.
2. If the type is initialized, you are done.
2.1. If the type is not yet initialized, try to take an initialization lock.
2.2. If successful, record this thread as responsible for initializing the type and proceed to step 2.3.
2.2.1. If not successful, see whether this thread or any thread waiting for this thread to complete already holds
the lock.
2.2.2. If so, return since blocking would create a deadlock. This thread will now see an incompletely initialized
state for the type, but no deadlock will arise.
2.2.3 If not, block until the type is initialized then return.
2.3 Initialize the base class type and then all interfaces implemented by this type.
2.4 Execute the type initialization code for this type.
2.5 Mark the type as initialized, release the initialization lock, awaken any threads waiting for this type to be
initialized, and return.
To be clear, that's the algorithm they propose for a CLR implementation, not your code.
Other answers have commented on the rock safety. Here's some more on your reference to hash codes:
The hash codes being the same implies that the two objects might be considered "equal" - a different concept to "the same". All a hash code really tells you is that, if two objects have different hash codes, they are definitely not "equal" - and therefore by implication definitely not "the same". Equality is defined by the overriding of the .Equals() method, and the contract imposed is that if two objects are considered equal by this method, then they must return the same value from their .GetHashCode() methods. Two variables are "the same" if their references are equal - i.e. they point to the same object in memory.
It's static meaning it belongs to the class, and it's readonly, so I cannot be changed after initialization, so yes you will get the same object.
It will work just fine in this case but if you mark the instance with the [ThreadStatic] attribute then inline initialization won't work and you'll have to use something else, like lazy initialization, in which case you don't have to take care if the operations using the singleton are "thread-safe" as the singleton is per thread.
Regards...
You could be interested in the fact that the laziness of the initialization
could vary.
Jon Skeet suggests to add an empty static constructor if you care about when
the instance is actually initialized.
In order to avoid exposing in a wrong way I provide you with the link about his
article on the Singleton Pattern.
Your question refers to the Fourth (and suggested) singleton pattern implementation discussed in his article.
Singleton: singleton implementation
Inside the article you find a link to a discussion on beforefieldinit and laziness of the initialization.
Your assumption that they are the same because the hash codes are the same is incorrect, GetHashCode() does a comparison on the fields of your object.
Assuming you didn't overload Object.Equals, you can do a simple equals comparison, which is by default a comparison by reference:
MyObject a = MyObject.Instance;
MyObject b = MyObject.Instance;
Console.WriteLine(a == b);
This will output True, by the way, because your singleton implementation is sort of correct. A static readonly field is guaranteed to be assigned only once. However, semantically it would be more correct to implement a property with only a get-accessor and use a private static field as a backing store.

Categories

Resources