Should you lock resources when reading values? - c#

When doing thread synchronization in C# should I also lock an object when I read a value or just changing it?
for example I have Queue<T> object. Should I just lock it when doing the Enqueue and Dequeue or should I also lock it when checking values like Count?

From MSDN:
A Queue<(Of <(T>)>) can support
multiple readers concurrently, as long
as the collection is not modified.
Even so, enumerating through a
collection is intrinsically not a
thread-safe procedure. To guarantee
thread safety during enumeration, you
can lock the collection during the
entire enumeration. To allow the
collection to be accessed by multiple
threads for reading and writing, you
must implement your own
synchronization.
You should ensure no reader is active while an item is queued (a lock is probably a good idea).
Looking at the count in reflector reveals a read from a private field. This can be okay depending on what you do with the value. This means you shouldn't do stuff like this (without proper locking):
if(queue.Count > 0)
queue.Dequeue();

Depends on what you want to do with lock. Usually this kind of locking needs a reader/writer locking mechanism.
Readers/writers locking means that readers share a lock, so you can have multiple readers reading the collection simultaneously, but to write, you should acquire an exclusive lock.

If you don't lock it, you may get an older value. A race condition could occur such that a write operation is performed changing Count, but you would get the value before the change. For example, if the queue has only one item, and a thread calls dequeue, another thread may read the count, find it still 1, and call dequeue again. The second call won't be done until the lock is granted, but at that time the queue would actually be empty.

The CLR guarantees atomic reads for values up to the width of the processor. So if you're running on 32 bit, reading ints will be atomic. If you're running on 64 bit machine, reading longs will be atomic. Ergo, if Count is an Int32 there's no need to lock.
This post is pertinent to your question.

Related

thread-safety of primitive concurrent read and write

Simplified illustration below, how does .NET deal with such a situation?
and if it would cause problems, would i have to lock/gate access to each and every field/property that might at times be written to + accessed from different threads?
A field somewhere
public class CrossRoads(){
public int _timeouts;
}
A background thread writer
public void TimeIsUp(CrossRoads crossRoads){
crossRoads._timeouts++;
}
Possibly at the same time, trying to read elsewhere
public void HowManyTimeOuts(CrossRoads crossRoads){
int timeOuts = crossRoads._timeouts;
}
The simple answer is that the above code has the ability to cause problems if accessed simultaneously from multiple threads.
The .Net framework provides two solutions: interlocking and thread synchronization.
For simple data type manipulation (i.e. ints), interlocking using the Interlocked class will work correctly and is the recommended approach.
In fact, interlocked provides specific methods (Increment and Decrement) that make this process easy:
Add an IncrementCount method to your CrossRoads class:
public void IncrementCount() {
Interlocked.Increment(ref _timeouts);
}
Then call this from your background worker:
public void TimeIsUp(CrossRoads crossRoads){
crossRoads.IncrementCount();
}
The reading of the value, unless of a 64-bit value on a 32-bit OS, are atomic. See the Interlocked.Read method documentation for more detail.
For class objects or more complex operations, you will need to use thread synchronization locking (lock in C# or SyncLock in VB.Net).
This is accomplished by creating a static synchronization object at the level the lock is to be applied (for example, inside your class), obtaining a lock on that object, and performing (only) the necessary operations inside that lock:
private static object SynchronizationObject = new Object();
public void PerformSomeCriticalWork()
{
lock (SynchronizationObject)
{
// do some critical work
}
}
The good news is that reads and writes to ints are guaranteed to be atomic, so no torn values. However, it is not guaranteed to do a safe ++, and the read could potentially be cached in registers. There's also the issue of instruction re-ordering.
I would use:
Interlocked.Increment(ref crossroads._timeouts);
For the write, which will ensure no values are lost, and;
int timeouts = Interlocked.CompareExchange(ref crossroads._timeouts, 0, 0);
For the read, since this observes the same rules as the increment. Strictly speaking "volatile" is probably enough for the read, but it is so poorly understood that the Interlocked seems (IMO) safer. Either way, we're avoiding a lock.
Well, I'm not a C# developer, but this is how it typically works at this level:
how does .NET deal with such a situation?
Unlocked. Not likely to be guaranteed to be atomic.
Would i have to lock/gate access to each and every field/property that might at times be written to + accessed from different threads?
Yes. An alternative would be to make a lock for the object available to the clients, then tell the clients they must lock the object while using the instance. This will reduce the number of locks acquisitions, and guarantee a more consistent, predictable, state for your clients.
Forget dotnet. At the machine language level, crossRoads._timeouts++ will be implemented as an INC [memory] instruction. This is known as a Read-Modify-Write instruction. These instructions are atomic with respect to multi-threading on a single processor*, (essentially implemented with time-slicing,) but are not atomic with respect to multi-threading using multiple processors or multiple cores.
So:
If you can guarantee that only TimeIsUp() will ever modify crossRoads._timeouts, and if you can guarantee that only one thread will ever execute TimeIsUp(), then it will be safe to do this. The writing in TimeIsUp() will work fine, and the reading in HowManyTimeOuts() (and any place else) will work fine. But if you also modify crossRoads._timeouts elsewhere, or if you ever spawn one more background thread writer, you will be in trouble.
In either case, my advice would be to play it safe and lock it.
(*) They are atomic with respect to multi-threading on a single processor because context switches between threads happen on a periodic interrupt, and on the x86 architectures these instructions are atomic with respect to interrupts, meaning that if an interrupt occurs while the CPU is executing such an instruction, the interrupt will wait until the instruction completes. This does not hold true with more complex instructions, for example those with the REP prefix.
Although an int may be 'native' size to a CPU (dealing in 32 or 64 bits at a time), if you are reading and writing from different threads to the same variable, you are best off locking this variable and synchronizing access.
There is never a guarantee that reads/writes maybe atomic to an int.
You can also use Interlocked.Increment for your purposes here.

ReaderWriteLockSlim or Lock

I m using ConcurrentBag to store object in run time. At some point I need to empty the bag and store the bag content to a list. This is what i do:
IList<T> list = new List<T>();
lock (bag)
{
T pixel;
while (bag.TryTake(out pixel))
{
list.Add(pixel);
}
}
My Question is with synchronization, As far as I read in the book lock is faster than others synchronization methods. Source -- http://www.albahari.com/threading/part2.aspx.
Performance is my second concern, I d like to know if I can use ReaderWriterLockSlim at this point. What would be the benefit of using ReaderWriterLockSlim? The reason is that, I dont want this operation to block incoming requests.
If yes, Should I use Upgradable Lock?
Any ideas ? Comments?
I'm not sure why you're using the lock. The whole idea behind ConcurrentBag is that it's concurrent.
Unless you're just trying to prevent some other thread from taking things or adding things to the bag while you're emptying it.
Re-reading your question, I'm pretty sure you don't want to synchronize access here at all. ConcurrentBag allows multiple threads to Take and Add, without you having to do any explicit synchronization.
If you lock the bag, then no other thread can add or remove things while your code is running. Assuming, of course, that you protect every other access to the bag with a lock. And once you do that, you've completely defeated the purpose of having a lock-free concurrent data structure. Your data structure has become a poorly-performing list that's controlled by a lock.
Same thing if you use a reader-writer lock. You'd have to synchronize every access.
You don't need to add any explicit synchronization in this case. Ditch the lock.
Lock is great when threads will do a lot of operations in a row(bursty - low contention)
RWSlim is great when you have a lot more read locks than write locks(read heavy - high read contention)
Lockless is great when you need a multiple readers and/or writers all working at the same time(mix of read/write - lots of contention)

Why lock when reading from a dictionary

I am confused by a code listing in a book i am reading, C# 3 in a Nutshell, on threading.
In the topic on Thread Safety in Application Servers, below code is given as an example of a UserCache:
static class UserCache
{
static Dictionary< int,User> _users = new Dictionary< int, User>();
internal static User GetUser(int id)
{
User u = null;
lock (_users) // Why lock this???
if (_users.TryGetValue(id, out u))
return u;
u = RetrieveUser(id); //Method to retrieve from databse
lock (_users) _users[id] = u; //Why lock this???
return u;
}
}
The authors explain why the RetrieveUser method is not in a lock, this is to avoid locking the cache for a longer period.
I am confused as to why lock the TryGetValue and the update of the dictionary since even with the above the dictionary is being updated twice if 2 threads call simultaneously with the same unretrieved id.
What is being achieved by locking the dictionary read?
Many thanks in advance for all your comments and insights.
The Dictionary<TKey, TValue> class is not threadsafe.
If one thread writes one key to the dictionary while a different thread reads the dictionary, it may get messed up. (For example, if the write operation triggers an array resize, or if the two keys are a hash collision)
Therefore, the code uses a lock to prevent concurrent writes.
There is a benign race condition when writing to the dictionary; it is possible, as you stated, for two threads to determine there is not a matching entry in the cache. In this case, both of them will read from the DB and then attempt to insert. Only the object inserted by the last thread is kept; the other object will be garbage collected when the first thread is done with it.
The read to the dictionary needs to be locked because another thread may be writing at the same time, and the read needs to search over a consistent structure.
Note that the ConcurrentDictionary introduced in .NET 4.0 pretty much replaces this kind of idiom.
That's a common practice to access any non thread safe structures like lists, dictionaries, common shared values, etc.
And answering main question: locking a read we guarantee that dictionary will not be changed by another thread while we are reading its value. This is not implemented in dictionary and that is why it’s called non thread safe :)
If two threads call in simultaneously and the id exists, then they will both return the correct User information. The first lock is to prevent errors like SLaks said - if someone is writing to the dictionary while you are trying to read it, you'll have issues. In this scenario, the second lock will never be reached.
If two threads call in simultaneously and the id does not exist, one thread will lock and enter TryGetValue, this will return false and set u to a default value. This first lock is again, to prevent the errors described by SLaks. At this point, that first thread will release the lock and the second thread will enter and do the same. Both will then set 'u' to information from 'RetrieveUser(id)'; this should be the same information. One thread will then lock the dictionary and assign _users[id] to the value of u. This second lock is so that two threads are trying to write values to the same memory locations simultaneously and corrupting that memory. I don't know what the second thread will do when it enters the assignment. It will either return early ignoring the update, or overwrite the existing data from the first thread. Regardless, the Dictionary will contain the same information because both threads should have recieved the same data in 'u' from RetrieveUser.
For performance, the auther compared two scenarios - the above scenario, which will be extremely rare and block while two threads try and write the same data, and second one where it is far more likely that two threads call in requesting data for an object that needs written, and one that exists. For example, threadA and threadB call in simultaneously and ThreadA locks for an id that doesn't exist. There is no reason to make threadB wait for a lookup while threadA is working on RetriveUser. This situation is probably far more likely than the duplicate ids described above, so for performance the author chose not to lock on the whole block.

Safe to get Count value from generic collection without locking the collection?

I have two threads, a producer thread that places objects into a generic List collection and a consumer thread that pulls those objects out of the same generic List. I've got the reads and writes to the collection properly synchronized using the lock keyword, and everything is working fine.
What I want to know is if it is ok to access the Count property without first locking the collection.
JaredPar refers to the Count property in his blog as a decision procedure that can lead to race conditions, like this:
if (list.Count > 0)
{
return list[0];
}
If the list has one item and that item is removed after the Count property is accessed but before the indexer, an exception will occur. I get that.
But would it be ok to use the Count property to, say, determine the initial size a completely different collection? The MSDN documentation says that instance members are not guaranteed to be thread safe, so should I just lock the collection before accessing the Count property?
I suspect it's "safe" in terms of "it's not going to cause anything to go catastrophically wrong" - but that you may get stale data. That's because I suspect it's just held in a simple variable, and that that's likely to be the case in the future. That's not the same as a guarantee though.
Personally I'd keep it simple: if you're accessing shared mutable data, only do so in a lock (using the same lock for the same data). Lock-free programming is all very well if you've got appropriate isolation in place (so you know you've got appropriate memory barriers, and you know that you'll never be modifying it in one thread while you're reading from it in another) but it sounds like that isn't the case here.
The good news is that acquiring an uncontested lock is incredibly cheap - so I'd go for the safe route if I were you. Threading is hard enough without introducing race conditions which are likely to give no significant performance benefit but at the cost of rare and unreproducible bugs.

In C# would it be better to use Queue.Synchronized or lock() for thread safety?

I have a Queue object that I need to ensure is thread-safe. Would it be better to use a lock object like this:
lock(myLockObject)
{
//do stuff with the queue
}
Or is it recommended to use Queue.Synchronized like this:
Queue.Synchronized(myQueue).whatever_i_want_to_do();
From reading the MSDN docs it says I should use Queue.Synchronized to make it thread-safe, but then it gives an example using a lock object. From the MSDN article:
To guarantee the thread safety of the
Queue, all operations must be done
through this wrapper only.
Enumerating through a collection is
intrinsically not a thread-safe
procedure. Even when a collection is
synchronized, other threads can still
modify the collection, which causes
the enumerator to throw an exception.
To guarantee thread safety during
enumeration, you can either lock the
collection during the entire
enumeration or catch the exceptions
resulting from changes made by other
threads.
If calling Synchronized() doesn't ensure thread-safety what's the point of it? Am I missing something here?
Personally I always prefer locking. It means that you get to decide the granularity. If you just rely on the Synchronized wrapper, each individual operation is synchronized but if you ever need to do more than one thing (e.g. iterating over the whole collection) you need to lock anyway. In the interests of simplicity, I prefer to just have one thing to remember - lock appropriately!
EDIT: As noted in comments, if you can use higher level abstractions, that's great. And if you do use locking, be careful with it - document what you expect to be locked where, and acquire/release locks for as short a period as possible (more for correctness than performance). Avoid calling into unknown code while holding a lock, avoid nested locks etc.
In .NET 4 there's a lot more support for higher-level abstractions (including lock-free code). Either way, I still wouldn't recommend using the synchronized wrappers.
There's a major problem with the Synchronized methods in the old collection library, in that they synchronize at too low a level of granularity (per method rather than per unit-of-work).
There's a classic race condition with a synchronized queue, shown below where you check the Count to see if it is safe to dequeue, but then the Dequeue method throws an exception indicating the queue is empty. This occurs because each individual operation is thread-safe, but the value of Count can change between when you query it and when you use the value.
object item;
if (queue.Count > 0)
{
// at this point another thread dequeues the last item, and then
// the next line will throw an InvalidOperationException...
item = queue.Dequeue();
}
You can safely write this using a manual lock around the entire unit-of-work (i.e. checking the count and dequeueing the item) as follows:
object item;
lock (queue)
{
if (queue.Count > 0)
{
item = queue.Dequeue();
}
}
So as you can't safely dequeue anything from a synchronized queue, I wouldn't bother with it and would just use manual locking.
.NET 4.0 should have a whole bunch of properly implemented thread-safe collections, but that's still nearly a year away unfortunately.
There's frequently a tension between demands for 'thread safe collections' and the requirement to perform multiple operations on the collection in an atomic fashion.
So Synchronized() gives you a collection which won't smash itself up if multiple threads add items to it simultaneously, but it doesn't magically give you a collection that knows that during an enumeration, nobody else must touch it.
As well as enumeration, common operations like "is this item already in the queue? No, then I'll add it" also require synchronisation which is wider than just the queue.
This way we don't need to lock the queue just to find out it was empty.
object item;
if (queue.Count > 0)
{
lock (queue)
{
if (queue.Count > 0)
{
item = queue.Dequeue();
}
}
}
It seems clear to me that using a lock(...) {...} lock is the right answer.
To guarantee the thread safety of the Queue, all operations must be done through this wrapper only.
If other threads access the queue without using .Synchronized(), then you'll be up a creek - unless all your queue access is locked up.

Categories

Resources