Some newbie questions about multi-threading in .NET which I think will help reinforce some concepts I'm trying to absorb - I've read several multi-threading material (including the Albahari ebook) but feel I just need some confirmation of some questions to help drive these concepts home
A lock scope protects a shared region of code - suppose there is a thread executing a method that increments a simple integer variable x in a loop - however this won't protect code elsewhere that might also alter variable x eg in another method on another thread ...
Since this is two different regions of code potentially affecting the same variable, do we solve this by locking both regions of code using the same lock variable for both lock scopes around variable x? If you locked both regions of code with different lock variables, this would not protect the variable correct?
To further this example, using the same lock variable, what would happen if for some reason, code in one method went into some infinite loop and never relinquished the lock variable - how could the second region of code in the other method detect this?
How does the choice of lock variable influence the behavior of the lock? I've read numerous posts on this subject already but can never seem to find a definitive answer - in some instances people explicitly use an object variable specifically for this purpose, other times people use lock(this) and finally there've been times I've seen people use a type object.
How do the different choices of lock variables influence the behavior / scope of the lock and what scenarios would it make sense to use one over the other?
suppose you have a hashtable wrapped in a class exposing add, remove, get and some sort of Calculate method (say each object represents a quantity and this method sums each value) and all these methods are locked - however, once a reference to an object in that collection is made available to other code and passed around an application, this object (not the hashtable) would now be outside the lock scope surrounding the methods of that class ..how could you then protect access / updates to those actual objects taken from the hashtable, which could interfere with the Calculate method?
Appreciate any heuristics provided that would help reinforce these concepts for me - thanks!
1) Yes
2) That's a deadlock
3) The parts of your code you want to block are an implementation detail of your class. Exposing the lock object by using lock(this) or lock(this.GetType()) is asking for trouble since now external code can lock the same object and block your code unintentionally or maliciously. The lock object should be private.
4) It isn't very clear what you mean, you certainly wouldn't want to expose the Hashtable directly. Just keep it as a private field of the class, encapsulating it.
However, the odds that you can safely expose your class to client code using threads go down very rapidly with the number of public methods and properties you expose. You'll quickly get to a point where only the client code can properly take a lock. Fine-grained locking creates lots of opportunities for threading races when the client code is holding on to property values. Say a Count property value you return. By the time it uses the value, like in a for loop, the Count property might have changed. Only the most careful design can avoid these traps, a serious headache.
Furthermore, fine-grained locking is very inefficient since it inevitably is done in the most inner parts of your code. Locks are not that expensive, a rough 100 cpu cycles, but it quickly adds up. Especially wasted effort if the class object isn't actually used in multiple threads.
You then have no option but to declare your class thread-unsafe and the client code needs to use it in a thread-safe manner. Also the core reason that so many .NET classes are not thread-safe. This is the biggest reason that threading is so hard to get right, the programmer least likely to do it correctly is responsible for doing the most difficult thing.
1)
You are correct. You must use the same lock object to protect two distinct area's of code that for example increment the variable x.
2)
This is known as a deadlock and is one of the difficulties with multithreaded programming. There are algorithms which can be used to prevent deadlocks such as the Bankers Algorithm.
3)
Some languages make locking easy, for example in .Net you can just create an object and use it as the shared lock. This is good for synchronising code within a given process. Lock(this) just applies the lock to the object in question. However try to avoid this, instead create a private object and use that. Lock(this) can lead to deadlocking situations. The lock object underneath is probably just a wrapper around a Critical Section. If you wanted to protect a resource across different processes you would need a much heavier named Mutex, this requires a lock on a kernel object and is expensive, so do not use unless you must.
4)You need to make sure locking is applied there as well. But surely when people call methods on this reference they call the methods which employ synchronisation.
Related
I've been wondering recently how lock (or more specific: Monitor) works internally in .NET with regards to the objects that are locked. Specifically, I'm wondering what the overhead is, if there are 'global' (Process) locks used, if it's possible to create more of those global locks if that's the case (for groups of monitors) and what happens to the objects that are passed to lock (they don't seem to introduce an extra memory overhead).
To clarify what I'm not asking about: I'm not asking here about what a Monitor is (I made one myself at University some time ago). I'm also not asking how to use lock, Monitor, how they compile to a try/finally, etc; I'm pretty well aware of that (and there are other SO questions related to that). This is about the inner workings of Monitor.Enter and Monitor.Exit.
For example, consider this code executed by ten threads:
for (int i=0; i<1000; ++i)
{
lock (myArray[i])
{
// ...
}
}
Is it bad to lock a thousand objects instead of one? What is impact in terms of performance / memory pressure?
The underlying monitor creates a wait queue. Is it possible to have more than one wait queue and how would I create that?
Monitor.Enter is not a normal .NET method (can't be decompiled with ILSpy or similar). The method is implemented internally by the CLR, so strictly speaking, there is no one answer for .NET as different runtimes can have different implementations.
All objects in .NET have an object header containing a pointer to the type of the object, but also an SyncBlock index into a SyncTableEntry. Normally that index is zero/non used, but when you lock on the object it will contain an index into the SyncTableEntry which then contains the reference to the actual lock object.
So locking of thousands of objects will indeed create a lot of locks which is an overhead.
The information I found was in this MSDN article: http://msdn.microsoft.com/en-us/magazine/cc163791.aspx
Here's a good place to read about monitors, memory barriers etc.
EDIT
Screen shot from the page in case page become down in future:
In a project of windows services (C# .Net Platform), I need a suggestion.
In the project I have class named Cache, in which I keep some data that I need frequently. There is a thread that updates cache after each 30 minutes. Where as there are multiple threads which use cache data.
In the cache class, there are getter and setter functions which are used by user threads and cache updater thread respectively. No one uses data objects like tables directly, because they are private members.
From the above context, do you think I should use locking functionality in the cache class?
The effects of not using locks when writing to a shared memory location (like cache) really depend on the application. If the code was used in banking software the results could be catastrophic.
As a rule o thumb - when multiple threads access the same location, even if only one tread writes and all the other read, you should use locks (for write operation). What can happen is that one thread will start reading data, get swiped out by the updater thread; So it'll potentially end up using a mixture of old and new data. If that really as an impact depends on the application and how sensible it is.
Key Point: If you don't lock on the reads there's a chance your read won't see the changes. A lock will force your read code to get values from main memory rather than pulling data from a cache or register. To avoid actually locking you could use Thread.MemoryBarrier(), which will do the same job without overhead of actually locking anything.
Minor Points: Using lock would prevent a read from getting half old data and half new data. If you are reading more than one field, I recommend it. If you are really clever, you could keep all the data in an immutable object and return that object to anyone calling the getter and so avoid the need for a lock. (When new data comes in, you create a new immutable object, then replace the old with the new in one go. Use a lock for the actual write, or, if you're still feeling really clever, make the field referencing the object volatile.)
Also: when your getter is called, remember it's running on many other threads. There's a tendency to think that once something is running the Cache class's code it's all on the same thread, and it's not.
I will try to make this question as generic as possible, but I will give a brief introduction to my actual problem -
I am trying to implement a concurrent skiplist for a priority queue. Each 'node', has a value, and an array of 'forward' nodes, where node.forward[i] represents the next node on the i-th level of the skiplist. For write access (i.e. insertions and deletions), I use a Spinlock (still to determine if that is the best lock to use)
My question is essentially, when I need a read access for a traversal,
node = node.forward[i]
What kind of thread safety do I need around something like this? If another thread is modifying node.forward[i] at exactly the same time that I read (with no current locking mechanism for read), what can happen here?
My initial thought is to have a ReaderWriterLockSLim on the getter and setter of the indexer for Forward. Will there be too much unnecessary locking in this scenario?
Edit: Or would it be best to instead use a Interlocked.Exchange for all of my reads?
If another thread is modifying node.forward[i] at exactly the same time that I read (with no current locking mechanism for read), what can happen here?
It really depends on the implementation. It's possible to use Interlocked.Exchange when setting "forward" in a way that can prevent the references from being invalid (as it can make the "set" atomic), but there is no guarantee of which reference you'd get on read. However, with a naive implementation, anything can happen, including getting bad data.
My initial thought is to have a ReaderWriterLockSLim on the getter and setter of the indexer for Forward.
This is likely to be a good place to start. It will be fairly easy to make a properly synchronized collection using a ReaderWriterLockSlim, and functional is always the first priority.
This would likely be a good starting point.
Will there be too much unnecessary locking in this scenario?
There's no way to know without seeing how you implement it, and more importantly, how it's goign to be used. Depending on your usage, you can profile and look for optimization opportunities if necessary at that point.
On a side note - you might want to reconsider using node.Forward[i] as opposed to more of a "linked list" approach here. Any access to Forward[i] is likely to require a fair bit of synchronization to iterate through the skip list i steps, all of which will need some synchronization to prevent errors if there are concurrent writes anywhere between node and i elements beyond node. If you only look ahead one step, you can (potentially) reduce the amount of synchronization required.
When I said atomic, I meant set of instructions will execute without any context switching to another thread on the same process (other kinds of switches have to be done of course). The only solution I came up with is to suspend all threads except currently executed before part and resume them after it. Any more elegant way?
The reason I want to do that is to collect a coherent state of objects running on multiple threads. However, their code cannot be changed (they're already compiled), so I cannot insert mutexes, semaphores, etc in it. The atomic operation is of course state collecting (i.e. copying some variables).
There are some atomic operations in the Interlocked class but it only provides a few very simple operations. It can't be used to create an entire atomic block of code.
I'd advise using locking carefully to make sure that your code will still work even if the context changes.
Well, you can use locks, but you can't prevent context switching exactly. But if your threads lock on the same object, then the threads waiting obviously won't be running, so there's no context switching involved since there's nothing to run.
You might want to look at this page too.
No. You can surround a block of code with a Monitor to make it thread-safe, but you cannot make general code snippets atomic.
object lck = new object();
lock(lck)
{
// thread safe code goes in here
}
No, that's against multi-tasking.
Unless very simple operations like incrementing ... which are not subject of your question.
It is possible to obtain a global state from a shared memory composed of a collection (array) of atomic one reader/multi writer registers. The solution is simple but not trivial. You can read the algorithm published in the paper "atomic snapshots of shared memory" or you can read the chapter 4 from the art of multiprocesor programming book, there you can get ideas on the implementation on the java language, of course, once you are familiarized with the idea you should be able to transport it to any other language. Sorry if my english is not well enough.
I discovered ThreadStaticAttribute, and I have a lot of questions about it:
all my previous thread-dependent static information was implemented as a static dictionary in which TKey is Thread, and when I wanted to access it, I used Thread.CurrentThread and that works. But this requires maintenance because if a thread dies, I have to delete the corresponding entry from the dictionary. And I also need to consider thread safety and a lot of other matters.
By using ThreadStaticAttribute, all these matters seem to be solved, but I need to be sure of it. My questions are: do I need to delete the instance held by ThreadStaticAttribute marked fields, somehow, before the thread dies?? Where is the information on that field held?? It is in the instance of a Thread object, or something like that, so that when it is not used anymore, the garbage collector automatically discards it? Are there performance penalties? What ones? Is it faster than using a Keyed collection like I was doing?
I need clarification on how ThreadStaticAttribute works.
No you do not need to delete instances of values help in a field which is tagged with ThreadStatic. The garbage collector will automatically pick them up when both the thread and the object are no longer reachable by rooted objects.
The only exception here is if the value implements IDisposable and you want to actively dispose of it. In general this is a hard problem to solve for a number of reasons. It's much simpler to not have values which implement IDisposable and are in a ThreadStatic field.
As to where this field is actually stored, it's somewhat irrelevant. All you need to be concerned about is that it will behave like any other object in .Net. The only two behavior differences are
The field will reference a different value per accessing thread.
The initializer for the field will only be run once (in practice, it's a bad idea to have any).
Marking a static member variable as [ThreadStatic] tells the compiler to allocate it in the thread's memory area (eg. where the thread's stack is allocated) rather than in the global memory area. Thus, each thread will have its own copy (which are guaranteed to be initialized to the default value for that type, eg. null, 0, false, etc; do not use in-line initializers as they will only initialize it for one thread).
So, when the thread goes away, so does its memory area, releasing the reference. Of course if it's something that needs more immediate disposal (open file streams, etc) instead of waiting for background garbage collection, you might want to make sure you do that before the thread exits.
There could be a limit to the amount of [ThreadStatic] space available, but it should be sufficient for sane uses. It should be somewhat faster than accessing a keyed collection (and more easily thread-safe), and I think it's comparable to accessing a normal static variable.
Correction: I have since heard that accessing ThreadStatic variables is somewhat slower than accessing normal static variables. I'm not sure if it is even actually faster than accessing a keyed collection, but it does avoid issues of orphans (which was your question) and needing locking for threadsafety which would complicate a keyed-collection approach.