Why lock when reading from a dictionary - c#

I am confused by a code listing in a book i am reading, C# 3 in a Nutshell, on threading.
In the topic on Thread Safety in Application Servers, below code is given as an example of a UserCache:
static class UserCache
{
static Dictionary< int,User> _users = new Dictionary< int, User>();
internal static User GetUser(int id)
{
User u = null;
lock (_users) // Why lock this???
if (_users.TryGetValue(id, out u))
return u;
u = RetrieveUser(id); //Method to retrieve from databse
lock (_users) _users[id] = u; //Why lock this???
return u;
}
}
The authors explain why the RetrieveUser method is not in a lock, this is to avoid locking the cache for a longer period.
I am confused as to why lock the TryGetValue and the update of the dictionary since even with the above the dictionary is being updated twice if 2 threads call simultaneously with the same unretrieved id.
What is being achieved by locking the dictionary read?
Many thanks in advance for all your comments and insights.

The Dictionary<TKey, TValue> class is not threadsafe.
If one thread writes one key to the dictionary while a different thread reads the dictionary, it may get messed up. (For example, if the write operation triggers an array resize, or if the two keys are a hash collision)
Therefore, the code uses a lock to prevent concurrent writes.

There is a benign race condition when writing to the dictionary; it is possible, as you stated, for two threads to determine there is not a matching entry in the cache. In this case, both of them will read from the DB and then attempt to insert. Only the object inserted by the last thread is kept; the other object will be garbage collected when the first thread is done with it.
The read to the dictionary needs to be locked because another thread may be writing at the same time, and the read needs to search over a consistent structure.
Note that the ConcurrentDictionary introduced in .NET 4.0 pretty much replaces this kind of idiom.

That's a common practice to access any non thread safe structures like lists, dictionaries, common shared values, etc.
And answering main question: locking a read we guarantee that dictionary will not be changed by another thread while we are reading its value. This is not implemented in dictionary and that is why it’s called non thread safe :)

If two threads call in simultaneously and the id exists, then they will both return the correct User information. The first lock is to prevent errors like SLaks said - if someone is writing to the dictionary while you are trying to read it, you'll have issues. In this scenario, the second lock will never be reached.
If two threads call in simultaneously and the id does not exist, one thread will lock and enter TryGetValue, this will return false and set u to a default value. This first lock is again, to prevent the errors described by SLaks. At this point, that first thread will release the lock and the second thread will enter and do the same. Both will then set 'u' to information from 'RetrieveUser(id)'; this should be the same information. One thread will then lock the dictionary and assign _users[id] to the value of u. This second lock is so that two threads are trying to write values to the same memory locations simultaneously and corrupting that memory. I don't know what the second thread will do when it enters the assignment. It will either return early ignoring the update, or overwrite the existing data from the first thread. Regardless, the Dictionary will contain the same information because both threads should have recieved the same data in 'u' from RetrieveUser.
For performance, the auther compared two scenarios - the above scenario, which will be extremely rare and block while two threads try and write the same data, and second one where it is far more likely that two threads call in requesting data for an object that needs written, and one that exists. For example, threadA and threadB call in simultaneously and ThreadA locks for an id that doesn't exist. There is no reason to make threadB wait for a lookup while threadA is working on RetriveUser. This situation is probably far more likely than the duplicate ids described above, so for performance the author chose not to lock on the whole block.

Related

C# Threading without locking Producer or Consumer

TLDR; version of the main questions:
While working with threads, is it safe to read a list's contents with 1 thread, while another write to it, as long you do not delete list contents (reoganize order) and only reads new object after the new object is added fully
While an Int is being updated from "Old Value" to "New Value" by one thread, is there is a risk, if another thread reads this Int that the value returned is neither "Old Value" or "New Value"
Is it possible for a thread to "skip" a critical region if its busy, instead of just going to sleep and wait for the regions release?
I have 2 pieces of code running in seperate threads and I want to have the one act as a producer for the other. I do not want either thread "sleeping" while waiting for access, but instead skip forward in their internal code if the other thread is accessing this.
My original plan were to share the data via this approach (and once counter got high enough switch to a secondary list to avoid overflows).
pseudo code of flow as I original intended it.
Producer
{
Int counterProducer;
bufferedObject newlyProducedObject;
List <buffered_Object> objectsProducer;
while(true)
{
<Do stuff until a new product is created and added to newlyProducedObject>;
objectsProducer.add(newlyProducedObject_Object);
counterProducer++
}
}
Consumer
{
Int counterConsumer;
Producer objectProducer; (contains reference to Producer class)
List <buffered_Object> personalQueue
while(true)
<Do useful work, such as working on personal queue, and polish nails if no personal queue>
//get all outstanding requests and move to personal queue
while (counterConsumer < objectProducer.GetcounterProducer())
{
personalQueue.add(objectProducer.GetItem(counterconsumer+1));
counterConsumer++;
}
}
Looking at this, everything looked fine at first glance, I knew I would not be retrieving a half constructed product from the queue, so the status of the list regardless of where it is should not be a problem even if a thread switch occour while the Producer is adding a new object. Is this assumption correct, or can there be problems here? (my guess is as the consumer is asking for a specific location in the list and new objects are added to the end, and objects are never deleted that this will not be a problem)
But what caught my eye was, could a similar problem occour that "counterProducer" is at an unknown value while it is being "counterProducer++"? Could this result in the value temporary be "null" or some unknown value? Will this be a potential issue?
My goal is to have neither of the two threads lock while waiting for a mutex but instead continue their loops, which is why I made the above first, as there is no locking.
If the usage of the list will cause problems, my workaround will be to make a linked list implementation, and share it between the two classes, still use the counters to see if new work has been added and keep last location while the personalQueue moves new stuff to personal queue. So producer add new links, consumer reads them, and deletes previous. (no counter on the list, just external counters to know how much has been added and removed)
alternative pseudo code to avoid the counterConsumer++ risk (need help with this).
Producer
{
Int publicCounterProducer;
Int privateCounterProducer;
bufferedObject newlyProducedObject;
List <buffered_Object> objectsProducer;
while(true)
{
<Do stuff until a new product is created and added to newlyProducedObject>;
objectsProducer.add(newlyProducedObject_Object);
privateCounterProducer++
<Need Help: Some code that updates the publicCounterProducer to the privateCounterProducer if that variable is not
locked, else skips ahead, and the counter will get updated at next pass, at some point the consumer must be done reading stuff, and
new stuff is prepared already>
}
}
Consumer
{
Int counterConsumer;
Producer objectProducer; (contains reference to Producer class)
List <buffered_Object> personalQueue
while(true)
<Do useful work, such as working on personal queue, and polish nails if no personal queue>
//get all outstanding requests and move to personal queue
<Need Help: tries to read the publicProducerCounter and set readProducerCounter to this, else skips this code>
while (counterConsumer < readProducerCounter)
{
personalQueue.add(objectProducer.GetItem(counterconsumer+1));
counterConsumer++;
}
}
So goal in the 2nd part of code, and I have not been able to figure out how to code this, is to make both classes not wait for the other in case the other is in the "critical region" of updating the publicCounterProducer. If I read the lock functionality correct, the threads will go to sleep waiting for the release, which is not what I want. Might end up with having to use it though, in which case, first pseudocode would do it, and just set a "lock" on the getting of the value.
Hope you can help me out with my many questions.
No it is not safe. A context switch can occur within .Add after List has added the object, but before List has updated the internal data structure.
If it is int32, or if it is int64 and you are running in an x64 process, then there is no risk. But if you have any doubts, use the Interlocked class.
Yes, you can use a Semaphore, and when it is time to enter the critical region, use WaitOne overload that takes a timeout. Pass a timeout of 0. If WaitOne returns true, then you successfully acquired the lock and can enter. If it returns false, then you did not acquire the lock and should not enter.
You should really look at the System.Collections.Concurrent namespace. In particular, look at the BlockingCollection. It has a bunch of Try* operators you can use to add/remove items from the collection without blocking.
While working with threads, is it safe to read a list's contents with 1 thread, while another write to it, as long you do not delete list contents (reoganize order) and only reads new object after the new object is added fully
No, it is not. A side-effect of adding an item to a list may be to reallocate its underlying array. Current implementations of List<T> update the internal reference before copying the old data to it, so multiple threads may observe a list of the correct size but containing no data.
While an Int is being updated from "Old Value" to "New Value" by one thread, is there is a risk, if another thread reads this Int that the value returned is neither "Old Value" or "New Value"
Nope, int updates are atomic. But if two threads are both incrementing counterProducer at once, it will go wrong. You should use Interlocked.Increment() to increment it.
Is it possible for a thread to "skip" a critical region if its busy, instead of just going to sleep and wait for the regions release?
No, but you can use (for example) WaitHandle.WaitOne(int) to see if a wait succeeded, and branch accordingly. WaitHandle is implemented by several synchronization classes, such as ManualResetEvent.
Incidentally, is there a reason you are not using the built-in Producer/Consumer classes such as BlockingCollection<T>? BlockingCollection is easy to use (after you read the documentation!) and I'd recommend using it instead.

Multi-threaded access to C# dictionary

I understand that C# dictionaries are not thread-safe when it comes to adding, reading and removing elements; however, can you access the Count Property of a C# dictionary in a thread safe way if another thread is writing, reading, and removing from the dictionary?
Since property is a method call under the hood hood so really situation is not such simple as at first glance.
T1: Accessing Count property
T1: get_Count() call (some kind of JMP/GOTO ASM instruction)
T1: read variable which represents a number of items == 1
T2: addign a new item, real count becomes 2
T1: returns 1 but really there are already two items
So if an application logic relies on a Count property value - theoretically you could ends up with race condition.
It should be threadsafe in that it won't blow up, but I'm fairly sure it is not threadsafe in that it might not give you the correct count due to other threads manipulating the dictionary.
[Edit: as usr points out, just because it's currently threadsafe at this level does not mean that it will continue to be so. You have no guarantees]
First: This is dangerous thinking. Be very careful if you put such a thing into production. Probably, you should just use a lock or ConcurrentDictionary.
Are you comfortable with an approximate answer?
But: Reflector shows that count just reads some field and returns it. This is likely not to change forever. So you reasonably could take the risk in this case.
You could create the dictionary as a static readonly object, and lock it when modifying it... that should deal with most problems, I would think.
Depending on what you want to achieve, you can protect the dictionary with a ReaderWriterLockSlim. This way you can have faster reads and only lock the dictionary when performing mutating operations.
Rather than assuming, I went and looked at it in reflector:
public int Count
{
get
{
// count and freeCount are local fields
return (this.count - this.freeCount);
}
}
So yes, it is "thread safe" in that accessing it will not cause corruption in the dictionary object. That said, I am not sure that I can think of any use case where getting the count with any accuracy is important if the dictionary is being accessed by another thread.

Safe to get Count value from generic collection without locking the collection?

I have two threads, a producer thread that places objects into a generic List collection and a consumer thread that pulls those objects out of the same generic List. I've got the reads and writes to the collection properly synchronized using the lock keyword, and everything is working fine.
What I want to know is if it is ok to access the Count property without first locking the collection.
JaredPar refers to the Count property in his blog as a decision procedure that can lead to race conditions, like this:
if (list.Count > 0)
{
return list[0];
}
If the list has one item and that item is removed after the Count property is accessed but before the indexer, an exception will occur. I get that.
But would it be ok to use the Count property to, say, determine the initial size a completely different collection? The MSDN documentation says that instance members are not guaranteed to be thread safe, so should I just lock the collection before accessing the Count property?
I suspect it's "safe" in terms of "it's not going to cause anything to go catastrophically wrong" - but that you may get stale data. That's because I suspect it's just held in a simple variable, and that that's likely to be the case in the future. That's not the same as a guarantee though.
Personally I'd keep it simple: if you're accessing shared mutable data, only do so in a lock (using the same lock for the same data). Lock-free programming is all very well if you've got appropriate isolation in place (so you know you've got appropriate memory barriers, and you know that you'll never be modifying it in one thread while you're reading from it in another) but it sounds like that isn't the case here.
The good news is that acquiring an uncontested lock is incredibly cheap - so I'd go for the safe route if I were you. Threading is hard enough without introducing race conditions which are likely to give no significant performance benefit but at the cost of rare and unreproducible bugs.

Should you lock resources when reading values?

When doing thread synchronization in C# should I also lock an object when I read a value or just changing it?
for example I have Queue<T> object. Should I just lock it when doing the Enqueue and Dequeue or should I also lock it when checking values like Count?
From MSDN:
A Queue<(Of <(T>)>) can support
multiple readers concurrently, as long
as the collection is not modified.
Even so, enumerating through a
collection is intrinsically not a
thread-safe procedure. To guarantee
thread safety during enumeration, you
can lock the collection during the
entire enumeration. To allow the
collection to be accessed by multiple
threads for reading and writing, you
must implement your own
synchronization.
You should ensure no reader is active while an item is queued (a lock is probably a good idea).
Looking at the count in reflector reveals a read from a private field. This can be okay depending on what you do with the value. This means you shouldn't do stuff like this (without proper locking):
if(queue.Count > 0)
queue.Dequeue();
Depends on what you want to do with lock. Usually this kind of locking needs a reader/writer locking mechanism.
Readers/writers locking means that readers share a lock, so you can have multiple readers reading the collection simultaneously, but to write, you should acquire an exclusive lock.
If you don't lock it, you may get an older value. A race condition could occur such that a write operation is performed changing Count, but you would get the value before the change. For example, if the queue has only one item, and a thread calls dequeue, another thread may read the count, find it still 1, and call dequeue again. The second call won't be done until the lock is granted, but at that time the queue would actually be empty.
The CLR guarantees atomic reads for values up to the width of the processor. So if you're running on 32 bit, reading ints will be atomic. If you're running on 64 bit machine, reading longs will be atomic. Ergo, if Count is an Int32 there's no need to lock.
This post is pertinent to your question.

Another locking question

I'm trying to get my multithreading understanding locked down. I'm doing my best to teach myself, but some of these issues need clarification.
I've gone through three iterations with a piece of code, experimenting with locking.
In this code, the only thing that needs locking is this.managerThreadPriority.
First, the simple, procedural approach, with minimalistic locking.
var managerThread = new Thread
(
new ThreadStart(this.ManagerThreadEntryPoint)
);
lock (this.locker)
{
managerThread.Priority = this.managerThreadPriority;
}
managerThread.Name = string.Format("Manager Thread ({0})", managerThread.GetHashCode());
managerThread.Start();
Next, a single statement to create and launch a new thread, but the lock appears to be scoped too large, to include the creation and launching of the thread. The compiler doesn't somehow magically know that the lock can be released after this.managerThreadPriority is used.
This kind of naive locking should be avoided, I would assume.
lock (this.locker)
{
new Thread
(
new ThreadStart(this.ManagerThreadEntryPoint)
)
{
Priority = this.managerThreadPriority,
Name = string.Format("Manager Thread ({0})", GetHashCode())
}
.Start();
}
Last, a single statement to create and launch a new thread, with a "embedded" lock only around the shared field.
new Thread
(
new ThreadStart(this.ManagerThreadEntryPoint)
)
{
Priority = new Func<ThreadPriorty>(() =>
{
lock (this.locker)
{
return this.managerThreadPriority;
}
})(),
Name = string.Format("Manager Thread ({0})", GetHashCode())
}
.Start();
Care to comment about the scoping of lock statements? For example, if I need to use a field in an if statement and that field needs to be locked, should I avoid locking the entire if statement? E.g.
bool isDumb;
lock (this.locker) isDumb = this.FieldAccessibleByMultipleThreads;
if (isDumb) ...
Vs.
lock (this.locker)
{
if (this.FieldAccessibleByMultipleThreads) ...
}
1) Before you even start the other thread, you don't have to worry about shared access to it at all.
2) Yes, you should lock all access to shared mutable data. (If it's immutable, no locking is required.)
3) Don't use GetHashCode() to indicate a thread ID. Use Thread.ManagedThreadId. I know, there are books which recommend Thread.GetHashCode() - but look at the docs.
Care to comment about the scoping of lock statements? For example, if I
need to use a field in an if statement and that field needs to be locked,
should I avoid locking the entire if statement?
In general, it should be scoped for the portion of code that needs the resource being guarded, and no more than that. This is so it can be available for other threads to make use of it as soon as possible.
But it depends on whether the resource you are locking is part of a bigger picture that has to maintain consistency, or whether it is a standalone resource not related directly to any other.
If you have interrelated parts that need to all change in a synchronized manner, that whole set of parts needs to be locked for the duration of the whole process.
If you have an independent, single item uncoupled to anything else, then only that one item needs to be locked long enough for a portion of the process to access it.
Another way to say it is, are you protecting synchronous or asynchronous access to the resource?
Synchronous access needs to hold on to it longer in general because it cares about a bigger picture that the resource is a part of. It must maintain consistency with related resources. You may very well wrap an entire for-loop in such a case if you want to prevent interruptions until all are processed.
Asynchronous access should hold onto it as briefly as possible. Thus, the more appropriate place for the lock would be inside portions of code, such as inside a for-loop or if-statement so you can free up the individual elements right away even before other ones are processed.
Aside from these two considerations, I would add one more. Avoid nesting of locks involving two different locking objects. I have learned by experience that it is a likely source of deadlocks, particularly if other parts of the code use them. If the two objects are part of a group that needs to be treated as a single whole all the time, such nesting should be refactored out.
There is no need to lock anything before you have started any threads.
If you are only going to read a variable there's no need for locks either. It's when you mix reads and writes that you need to use mutexes and similar locking, and you need to lock in both the reading and the writing thread.

Categories

Resources