Another locking question - c#

I'm trying to get my multithreading understanding locked down. I'm doing my best to teach myself, but some of these issues need clarification.
I've gone through three iterations with a piece of code, experimenting with locking.
In this code, the only thing that needs locking is this.managerThreadPriority.
First, the simple, procedural approach, with minimalistic locking.
var managerThread = new Thread
(
new ThreadStart(this.ManagerThreadEntryPoint)
);
lock (this.locker)
{
managerThread.Priority = this.managerThreadPriority;
}
managerThread.Name = string.Format("Manager Thread ({0})", managerThread.GetHashCode());
managerThread.Start();
Next, a single statement to create and launch a new thread, but the lock appears to be scoped too large, to include the creation and launching of the thread. The compiler doesn't somehow magically know that the lock can be released after this.managerThreadPriority is used.
This kind of naive locking should be avoided, I would assume.
lock (this.locker)
{
new Thread
(
new ThreadStart(this.ManagerThreadEntryPoint)
)
{
Priority = this.managerThreadPriority,
Name = string.Format("Manager Thread ({0})", GetHashCode())
}
.Start();
}
Last, a single statement to create and launch a new thread, with a "embedded" lock only around the shared field.
new Thread
(
new ThreadStart(this.ManagerThreadEntryPoint)
)
{
Priority = new Func<ThreadPriorty>(() =>
{
lock (this.locker)
{
return this.managerThreadPriority;
}
})(),
Name = string.Format("Manager Thread ({0})", GetHashCode())
}
.Start();
Care to comment about the scoping of lock statements? For example, if I need to use a field in an if statement and that field needs to be locked, should I avoid locking the entire if statement? E.g.
bool isDumb;
lock (this.locker) isDumb = this.FieldAccessibleByMultipleThreads;
if (isDumb) ...
Vs.
lock (this.locker)
{
if (this.FieldAccessibleByMultipleThreads) ...
}

1) Before you even start the other thread, you don't have to worry about shared access to it at all.
2) Yes, you should lock all access to shared mutable data. (If it's immutable, no locking is required.)
3) Don't use GetHashCode() to indicate a thread ID. Use Thread.ManagedThreadId. I know, there are books which recommend Thread.GetHashCode() - but look at the docs.

Care to comment about the scoping of lock statements? For example, if I
need to use a field in an if statement and that field needs to be locked,
should I avoid locking the entire if statement?
In general, it should be scoped for the portion of code that needs the resource being guarded, and no more than that. This is so it can be available for other threads to make use of it as soon as possible.
But it depends on whether the resource you are locking is part of a bigger picture that has to maintain consistency, or whether it is a standalone resource not related directly to any other.
If you have interrelated parts that need to all change in a synchronized manner, that whole set of parts needs to be locked for the duration of the whole process.
If you have an independent, single item uncoupled to anything else, then only that one item needs to be locked long enough for a portion of the process to access it.
Another way to say it is, are you protecting synchronous or asynchronous access to the resource?
Synchronous access needs to hold on to it longer in general because it cares about a bigger picture that the resource is a part of. It must maintain consistency with related resources. You may very well wrap an entire for-loop in such a case if you want to prevent interruptions until all are processed.
Asynchronous access should hold onto it as briefly as possible. Thus, the more appropriate place for the lock would be inside portions of code, such as inside a for-loop or if-statement so you can free up the individual elements right away even before other ones are processed.
Aside from these two considerations, I would add one more. Avoid nesting of locks involving two different locking objects. I have learned by experience that it is a likely source of deadlocks, particularly if other parts of the code use them. If the two objects are part of a group that needs to be treated as a single whole all the time, such nesting should be refactored out.

There is no need to lock anything before you have started any threads.
If you are only going to read a variable there's no need for locks either. It's when you mix reads and writes that you need to use mutexes and similar locking, and you need to lock in both the reading and the writing thread.

Related

C# "lock" keyword: Why is an object necessary for the syntax?

To mark code as a critical section we do this:
Object lockThis = new Object();
lock (lockThis)
{
//Critical Section
}
Why is it necessary to have an object as a part of the lock syntax? In other words, why can't this work the same:
lock
{
//Critical Section
}
Because you don't just lock - you lock something (you lock a lock).
The point of locking is to disallow two threads from directly competing for the same resource. Therefore, you hide that resource behind an arbitrary object. That arbitrary object acts as a lock. When one thread enters a critical section, it locks the lock and the others can't get in. When the thread finishes its work in the critical section, it unlocks and leaves the keys out for whichever thread happens to come next.
If a program has one resource that's candidate for competing accesses, it's possible that it will have other such resources as well! But often these resources are independent from each other - in other words, it may make sense for one thread to be able to lock one particular resource and another thread in the meantime to be able to lock another resource, without those two interfering.
A resource may also need to be accessed from two critical sections. Those two will need to have the same lock. If each had their own, they wouldn't be effective in keeping the resource uncontested.
Obviously, then, we don't just lock - we lock each particular resource's own lock. But the compiler can't autogenerate that arbitrary lock object silently, because it doesn't know which resources should be locked using the same lock and which ones should have their own lock. That's why you have to explicitly state which lock protects which block (or blocks) of code.
Note that the correct usage of an object as a lock requires that the object is persistent (at least as persistent as the corresponding resource). In other words, you can't create it in a local scope, store it in a local variable and throw it away when the scope exits, because this means you're not actually locking anything. If you have one persistent object acting as a lock for a given resource, only one thread can enter that section. If you create a new lock object every time someone attempts to get in, then anyone can enter at all times.
It needs something to use as the lock.
This way two different methods can share the same lock, so only one can be used at a time.
Object lockThis = new Object();
void read()
{
lock (lockThis)
{
//Critical Section
}
}
void write()
{
lock (lockThis)
{
//Critical Section
}
}
Well the simple answer is that that's just how the language is specified. Locks need to have some sort of object to lock on. This allows you to control what code must be locked by controlling the scope and lifecycle of this object. See lock Statement (C# Reference).
It's possible that the designers of the language could have provided an anonymous-style lock like you've suggested, and rely on the compiler to generate an appropriate object behind the scenes. But should the object that's created be a static or instance member? How would it be used if you had multiple methods that need a lock in one class? These are difficult questions, and I'm sure the designers of the language simply didn't feel the benefit of including such a construct would be worth the added complexity or confusion it would entail.

Multithreading: difference between types of locking objects

Please explain the difference between these two types of locking.
I have a List which I want to access thread-safe:
var tasks = new List<string>();
1.
var locker = new object();
lock (locker)
{
tasks.Add("work 1");
}
2.
lock (tasks)
{
tasks.Add("work 2");
}
My thoughts:
Prevents two different threads from running the locked block of code at the same time.
But if another thread runs a different method where it tries to access task - this type of lock won't help.
Blocks the List<> instance so other threads in other methods will be blocked untill I unlock tasks.
Am I right or mistaking?
(2) only blocks other code that explicitly calls lock (tasks). Generally, you should only do this if you know that tasks is a private field and thus can enforce throughout your class that lock (tasks) means locking operations on the list. This can be a nice shortcut when the lock is conceptually linked with access to the collection and you don't need to worry about public exposure of the lock. You don't get this 'for free', though; it needs to be explicitly used just like locking on any other object.
They do the same thing. Any other code that tries to modify the list without locking the same object will cause potential race conditions.
A better way might be to encapsulate the list in another object that obtains a lock before doing any operations on the underlying list and then any other code can simple call methods on the wrapper object without worrying about obtaining the lock.

Why is specifying a synchronization object in the lock statement mandatory

I'm trying to wrap my mind around what exactly happens in the lock statement.
If I understood correctly, the lock statement is syntactic sugar and the following...
Object _lock = new Object();
lock (_lock)
{
// Critical code section
}
...gets translated into something roughly like:
Object _lock = new Object();
Monitor.Enter(_lock);
try
{
// Critical code section
}
finally { Monitor.Exit (_lock); }
I have used the lock statement a few times, and always created a private field _lock, as a dedicated synchronization object. I do understand why you should not lock on public variables or types.
But why does the compiler not create that instance field as well? I feel there might in fact be situations where the developer wants to specify what to lock on, but from my experience, in most cases that is of absolutely no interest, you just want that lock! So why is there no parameterless overload of lock?
lock()
{
// First critical code section
}
lock()
{
// Second critical code section
}
would be translated into (or similar):
[DebuggerHidden]
private readonly object _lock1 = new object()
[DebuggerHidden]
private readonly object _lock2 = new object()
Monitor.Enter(_lock1);
try
{
// First critical code section
}
finally { Monitor.Exit(_lock1); }
Monitor.Enter(_lock2);
try
{
// Second critical code section
}
finally { Monitor.Exit(_lock2); }
EDIT: I have obviously been unclear concerning multiple lock statements. Updated the question to contain two lock statements.
The state of the lock needs to be stored. Whether or not it was entered. So that another thread that tries to enter the same lock can be blocked.
That requires a variable. Just a very simple one, an plain object is enough.
A hard requirement for such a variable is that it is created before any lock statement uses it. Trying to create it on-the-fly as you propose creates a new problem, there's a now a need to use a lock to safely create the variable so that only the first thread that enters the lock creates it and other threads trying to enter the lock are blocked until it is created. Which requires a variable. Etcetera, an unsolvable chicken-and-egg problem.
There can be situations when you will need two different lock's, which are independent of each other. Meaning when one 'lockable' part of code is locked other 'lockable' should not be locked. That's why there is ability to provide lock objects - you can have several of them for several independent lock's
In order for the no-variable thing to work, you'd have to either:
Have one auto-generated lock variable per lock block (what you did, which means that you can't have two different lock blocks locking on the same variable)
Use the same lock variable for all lock blocks in the same class (which means you can't have two independent things protected)
Plus, you'd also have the issue of deciding whether those should be instance-level or static.
In the end, I'm guessing the language designers didn't feel that the simplification in one specific case was worth the ambiguity introduced while reading code. Threading code (which is the reason to use locks) is already hard to write correctly and verify. Making it harder would be a not-good thing.
Allowing for an implicit lock object might encourage the use of a single lock object, which is considered bad practice. By enforcing the use of an explicit lock object, the language encourages you to name the lock something useful, such as "countIncementLock".
A variable named thusly would not encourage developers to use the same lock object when performing a completely separate operation, such as writing to a stream of some kind.
Therefore, the object could be writing to a stream on one thread, while incrementing a counter on another thread, and neither of the threads would necessarily interfere with each other.
The only reason why the language wouldn't do this is because is because it would look like a good practice, but in reality would be hiding a bad practice.
Edit:
Perhaps the designers of C# did not want implicit lock variables because they thought it might encourage bad behavour.
Perhaps the designers did not think of implicit lock variables at all, because they had other more important things to think about first.
If every C# developer knew exactly what was happening when they wrote lock(), and they knew the implications, then there's no reason why it shouldn't exist, and no reason why it shouldn't work how you're suggesting.

What is a best approach to make a function or set of statements thread safe in C#?

What is a best approach to make a function or set of statements thread safe in C#?
Don't use shared read/write state when possible. Go with immutable types.
Take a look at the C# lock statement. Read Jon Skeet's article on multi threading in .net.
It depends on what you're trying to accomplish.
If you want to make sure that in any given time only one thread would run a specific code use lock or Monitor:
public void Func(...)
{
lock(syncObject)
{
// only one thread can enter this code
}
}
On the other hand you want multiple threads to run the same code but do not want them to cause race conditions by changing the same point in memory don't write to static/shared objects which can be reached by multiple at the same time.
BTW - If you want to create a static object that would be shared only within a single thread use the ThreadStatic attribute (http://msdn.microsoft.com/en-us/library/system.threadstaticattribute(VS.71).aspx).
Use lock statement around shared state variables. Once you ensured thread safety, run code through code profiler to find bottlenecks and optimize those places with more advanced multi-threading constructs.
The best approach will vary depending on your exact problem at hand.
The simplest approach in C# is to "lock" resources shared by multiple threads using a lock statement. This creates a block of code which can only be accessed by one thread at a time: the one which has obtained the "lock" object. For example, this property is thread safe using the lock syntax:
public class MyClass
{
private int _myValue;
public int MyProperty
{
get
{
lock(this)
{
return _myValue;
}
}
set
{
lock(this)
{
_myValue = value;
}
}
}
}
A thread aquires the lock at the start of the block and only releases the lock at the end of the block. If the lock is not available, the thread will wait until the lock is available. Obviously, access to the private variable within the class is not thread-safe, so all threads must access the value through the property to be safe.
This is by far the simplest way for threads to have safe access to shared data, however it only touches the tip of the iceberg of techniques for threading.
Write the function in such a way that:
It does not modify its parameters in any way
It does not access any state outside of its local variables.
Otherwise, race conditions MAY occur. The code must be thoroughly examined for such conditions and appropriate thread synchronization must be implemented (locks, etc...). Writing code that does not require synchronization is the best way to make it thread-safe. Of course, this is often not possible - but should be the first option considered in most situations.
There's a lot to understand when learning what "thread safe" means and all the issues that are introduced (synchronization, etc).
I'd recommend reading through this page in order to get a better feel for what you're asking: Threading in C#. It gives a pretty comprehensive overview of the subject, which sounds like it could be pretty helpful.
And Mehrdad's absolutely right -- go with immutable types if you can help it.

Re-entrant locks in C#

Will the following code result in a deadlock using C# on .NET?
class MyClass
{
private object lockObj = new object();
public void Foo()
{
lock(lockObj)
{
Bar();
}
}
public void Bar()
{
lock(lockObj)
{
// Do something
}
}
}
No, not as long as you are locking on the same object. The recursive code effectively already has the lock and so can continue unhindered.
lock(object) {...} is shorthand for using the Monitor class. As Marc points out, Monitor allows re-entrancy, so repeated attempts to lock on an object on which the current thread already has a lock will work just fine.
If you start locking on different objects, that's when you have to be careful. Pay particular attention to:
Always acquire locks on a given number of objects in the same sequence.
Always release locks in the reverse sequence to how you acquire them.
If you break either of these rules you're pretty much guaranteed to get deadlock issues at some point.
Here is one good webpage describing thread synchronisation in .NET: http://dotnetdebug.net/2005/07/20/monitor-class-avoiding-deadlocks/
Also, lock on as few objects at a time as possible. Consider applying coarse-grained locks where possible. The idea being that if you can write your code such that there is an object graph and you can acquire locks on the root of that object graph, then do so. This means you have one lock on that root object and therefore don't have to worry so much about the sequence in which you acquire/release locks.
(One further note, your example isn't technically recursive. For it to be recursive, Bar() would have to call itself, typically as part of an iteration.)
Well, Monitor allows re-entrancy, so you can't deadlock yourself... so no: it shouldn't do
If a thread is already holding a lock, then it will not block itself. The .Net framework ensures this. You only have to make sure that two threads do not attempt to aquire the same two locks out of sequence by whatever code paths.
The same thread can aquire the same lock multiple times, but you have to make sure you release the lock the same number of times that you aquire it. Of course, as long as you are using the "lock" keyword to accomplish this, it happens automatically.
No, this code will not have dead locks.
If you really want to create deadlock simplest one requires at-least 2 resources.
Consider dog and the bone scenario.
1. A dog has full control over 1 bone so any other dog has to wait.
2. 2 dog with 2 bones are minimum required to create a deadlock when they lock their bones respectively and seek others bone too.
.. so on and so forth n dogs and m bones and cause more sophisticated deadlocks.

Categories

Resources