I am currently reading code delivered by our extern person and I don't understand this part of the code:
private ConcurrentDictionary<Int64, Person> users = new ConcurrentDictionary<Int64, Person>();
private Dictionary<String, Int64> connectionIndex = new Dictionary<String, Int64>();
public Boolean AddNewUser(Int64 userId, Person user) {
Boolean added = false;
lock (users) {
if (users.Select(X=>X.Key==userId).Count()>0)
{
Person usrtmp = new Person();
users.TryRemove(userId,out usrtmp)
}
added = users.TryAdd(userId, user);
if (added)
{
connectionIndex.Add(user.ConnectionId, userId);
}
}
return added;
}
Why "users concurrent dictionary" is locked before any operation on that dictionary is done? Is it necessary? From my point of view is lock statement unnecessary because ConcurrentDictionary contains thread-safe operations. Am I right?
Also, I know about "performance issue" when you use .Count() or Key, Value operations on ConcurrentDictionary. Is it LINQ statement OK in this point of view?
Thanks for the answers.
Why "users concurrent dictionary" is locked before any operation on that dictionary is done? Is it necessary?
The lock is certainly necessary given how they're using it. They're performing multiple operations on multiple different dictionaries, and it's important that no other thread interact with either dictionary in any way while that is happening. Doing that requires a lock.
The only way you could remove the lock is if you get to a point where the only usage of a shared data structure is one single method call on a single concurrent dictionary. Without knowing the exact requirements we can't know if that's possible, but if both dictionaries are needed, then it certainly isn't.
Now, given that you have put yourself in a position where you always need to lock over all access to the concurrent dictionary, there's no reason to use a concurrent dictionary and not a regular dictionary; you've already made the decision to use your own synchronization.
From my point of view is lock statement unnecessary because ConcurrentDictionary contains thread-safe operations. Am I right?
No. For starters, the non-concurrent dictionary cannot be accessed without a lock
Also, I know about "performance issue" when you use .Count() or Key, Value operations on ConcurrentDictionary. Is it LINQ statement OK in this point of view?
It's a horrible idea for lots of reasons. It's trying to do a linear search through a dictionary to see if a key exists. You should never be doing that. You should be using ContainsKey. Additionally, there's just no point if checking if the key is there before trying to remove it, you can just try to remove it and see what happens. It's also completely unsafe if you weren't already locking, as someone else could be changing the dictionary while you're iterating it, and it can change after you've searched it, making checking before doing anything pointless, as you can't assume that the thing that you just checked is true.
Related
I am in need of a data type that is able to insert entries and then be able to quickly determine if an entry has already been inserted. A Dictionary seems to suit this need (see example). However, I have no use for the dictionary's values. Should I still use a dictionary or is there another better suited data type?
public class Foo
{
private Dictionary<string, bool> Entities;
...
public void AddEntity(string bar)
{
if (!Entities.ContainsKey(bar))
{
// bool value true here has no use and is just a placeholder
Entities.Add(bar, true);
}
}
public string[] GetEntities()
{
return Entities.Keys.ToArray();
}
}
You can use HashSet<T>.
The HashSet<T> class provides high-performance set operations. A set
is a collection that contains no duplicate elements, and whose
elements are in no particular order.
Habib's answer is excellent, but for multi-threaded environments if you use a HashSet<T> then by consequence you have to use locks to protect access to it. I find myself more prone to creating deadlocks with lock statements. Also, locks yield a worse speedup per Amdahl's law because adding a lock statement reduces the percentage of your code that is actually parallel.
For those reasons, a ConcurrentDictionary<T,object> fits the bill in multi-threaded environments. If you end up using one, then wrap it like you did in your question. Just new up objects to toss in as values as needed, since the values won't be important. You can verify that there are no lock statements in its source code.
If you didn't need mutability of the collection then this would be moot. But your question implies that you do need it, since you have an AddEntity method.
Additional info 2017-05-19 - actually, ConcurrentDictionary does use locks internally, although not lock statements per se--it uses Monitor.Enter (check out the TryAddInternal method). However, it seems to lock on individual buckets within the dictionary, which means there will be less contention than putting the entire thing in a lock statement.
So all in all, ConcurrentDictionary is often better for multithreaded environments.
It's actually quite difficult (impossible?) to make a concurrent hash set using only the Interlocked methods. I tried on my own and kept running into the problem of needing to alter two things at the same time--something that only locking can do in general. One workaround I found was to use singly-linked lists for the hash buckets and intentionally create cycles in a list when one thread needed to operate on a node without interference from other threads; this would cause other threads to get caught spinning around in the same spot until that thread was done with its node and undid the cycle. Sure, it technically didn't use locks, but it did not scale well.
I have a dictionary with a fixed collection of keys, which I create at the beginning of the program. Later, I have some threads updating the dictionary with values.
No pairs are added or removed once the threads started.
Each thread has its own key. meaning, only one thread will access a certain key.
the thread might update the value.
The question is, should I lock the dictionary?
UPDATE:
Thanks all for the answers,
I tried to simplify the situation when i asked this question, just to understand the behaviour of the dictionary.
To make myself clear, here is the full version:
I have a dictionary with ~3000 entries (fixed keys), and I have more than one thread accessing the key (shared resourse), but I know for a fact that only one thread is accessing a key entry at a time.
so, should I lock the dictionary? and - when you have the full version now, is a dictionary the right choise at all?
Thanks!
FROM MSDN
A Dictionary can support multiple readers concurrently, as long as the collection is not modified.
To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization.
For a thread-safe alternative, see ConcurrentDictionary<TKey, TValue>.
Let's deal with your question one interpretation at a time.
First interpretation: Given how Dictionary<TKey, TValue> is implemented, with the context I've given, do I need to lock the dictionary?
No, you don't.
Second interpretation: Given how Dictionary<TKey, TValue is documented, with the context I've given, do I need to lock the dictionary?
Yes, you definitely should.
There is no guarantee that the access, which might be OK today, will be OK tomorrow, in a multithreaded world since the type is documented as not threadsafe. This allows the programmers to make certain assumptions about the state and integrity of the type that they would otherwise have to build in guarantees for.
A hotfix or update to .NET, or a whole new version, might change the implementation and make it break and this is your fault for relying on undocumented behavior.
Third interpretation: Given the context I've given, is a dictionary the right choice?
No it isn't. Either switch to a threadsafe type, or simply don't use a dictionary at all. Why not just use a variable per thread instead?
Conclusion: If you intend to use the dictionary, lock the dictionary. If it's OK to switch to something else, do it.
Use a ConcurrentDictionary, don't reinvent the wheel.
Better still, refactor your code to avoid this unecessary contention.
If there is no communication between the threads you could just do something like this:
assuming a function that changes a value.
private static KeyValuePair<TKey, TValue> ValueChanger<TKey, TValue>(
KeyValuePair<TKey, TValue> initial)
{
// I don't know what you do so, i'll just return the value.
return initial;
}
lets say you have some starting data,
var start = Enumerable.Range(1, 3000)
.Select(i => new KeyValuePair<int, object>(i, new object()));
you could process them all at once like this,
var results = start.AsParallel().Select(ValueChanger);
when, results is evaluated, all 3000 ValueChangers will run concurrently, yielding a IEnumerable<KeyValuePair<int, object>>.
There will be no interaction between the threads, thus no possible concurrency problems.
If you want to turn the results into a Dictionary you could,
var resultsDictionary = results.ToDictionary(p => p.Key, p => p.Value);
This may or may not be useful in your situation but, without more detail its hard to say.
If each thread access only one "value" and if you dont care about others I'll say you dont need a Dictionary at all. You can use ThreadLocal or ThreadStatic variables.
If at all you need a Dictionary you definitely need a lock.
If you're in .Net 4.0 or above I'll strongly suggest you to use ConcurrentDictionary, you don't need to synchronize access when using ConcurrentDictionary because it is already "ThreadSafe".
The Diectionary is not thread safe but in your code you do not have to do that; you said one thread update one value so you do not have multi threading problem!
I do not have the code so I'm not sure 100%.
Also check this :Making dictionary access thread-safe?
If you're not adding keys, but simply modifying values, why not completely remove the need for writing directly to the Dictionary by storing complex objects as the value and modifying a value within the complex type. That way, you respect the thread safety constraints of the dictionary.
So:
class ValueWrapper<T>
{
public T Value{get;set;}
}
//...
var myDic = new Dictionary<KeyType, ValueWrapper<ValueType>>();
//...
myDic[someKey].Value = newValue;
You're now not writing directly to the dictionary but you can modify values.
Don't try to do the same with keys. Necessarily, they should be immutable
Given the constraint "I know for a fact that only one thread is accessing a key entry at a time", I don't think you have any problem.
Possible modifications of a Dictionary are: add, update and remove.
If the Dictionary is modified or allowed to be modified, you must use a synchronization mechanism of choice to eliminate the potential race condition, in which one thread reads the old dirty value while a second thread is currently replacing the value/updating the key.
To safe you some work, use the ConcurentDictionary in this scenario.
If the Dictionary is never modified after creation, there won't be any race conditions. A synchronization is therefore not required.
This is a special scenario in which you can replace the table with a read-only table. To add the important robustness, like guarding against potential bugs by accidentally manipulating the table, you should make the Dictionary immutable (or read-only). To give the developer compiler support, such an immutable implementation must throw an exception on any manipulation attempts.
To safe you some work, you can use the ReadOnlyDictionary in this scenario. Note that the underlying Dictionary of the ReadOnlyDictionary is still mutable and that its changes are propagated to the ReadOnlyDictionary facade. The ReadOnlyDictionary only helps to ensure that the table is not accidentally modified by its consumers.
This means: Dictionary is never an option in a multithreaded context.
Rather use the ConcurrentDictionary or a synchronization mechanism in general (or use the ReadOnlyDictionary if you can guarantee that the original source collection never changes).
Since you allow and expect manipulations of the table ("[...] the thread might update the value"), you must use a synchronization mechanism of choice or the ConcurrentDictionary.
I use a ConcurrentDictioanry<string, HashSet<string>> to access some data across many threads.
I read in this article (scroll down) that the method AddOrUpdate is not executed in the lock, so it could endanger thread-safety.
My code is as follows:
//keys and bar are not the concern here
ConcurrentDictioanry<string, HashSet<string>> foo = new ...;
foreach(var key in keys) {
foo.AddOrUpdate(key, new HashSet<string> { bar }, (key, val) => {
val.Add(bar);
return val;
});
}
Should I enclose the AddOrUpdate call in a lock statement in order to be sure everything is thread-safe?
Locking during AddOrUpdate on its own wouldn't help - you'd still have to lock every time you read from the set.
If you're going to treat this collection as thread-safe, you really need the values to be thread-safe too. You need a ConcurrentSet, ideally. Now that doesn't exist within the framework (unless I've missed something) but you could probably create your own ConcurrentSet<T> which used a ConcurrentDictionary<T, int> (or whatever TValue you like) as its underlying data structure. Basically you'd ignore the value within the dictionary, and just treat the presence of the key as the important part.
You don't need to implement everything within ISet<T> - just the bits you actually need.
You'd then create a ConcurrentDictionary<string, ConcurrentSet<string>> in your application code, and you're away - no need for locking.
You'll need to fix this code, it creates a lot of garbage. You create a new HashSet even if none is required. Use the other overload, the one that accepts the valueFactory delegate. So the HashSet is only created when the key isn't yet present in the dictionary.
The valueFactory might be called multiple times if multiple threads concurrently try to add the same value of key and it is not present. Very low odds but not zero. Only one of these hashsets will be used. Not a problem, creating the HashSet has no side effects that could cause threading trouble, the extra copies just get garbage collected.
The article states that the add delegate is not executed in the dictionary's lock, and that the element you get might not be the element created in that thread by the add delegate. That's not a thread safety issue; the dictionary's state will be consistent and all callers will get the same instance, even if a different instance was created for each of them (and all but one get dropped).
Seems the better answer would be to use Lazy, per this article on the methods that pass in a delegate.
Also another good article Here on Lazy loading the add delegate.
Suppose that I have a Dictionary<string, string>. The dictionary is declared as public static in my console program.
If I'm working with threads and I want to do foreach on this Dictionary from one thread but at the same time another thread want to add item to the dictionary. This would cause a bug here because we can't modify our Dictionary while we are running on it with a foreach loop in another thread.
To bypass this problem I created a lock statement on the same static object on each operation on the dictionary.
Is this the best way to bypass this problem? My Dictionary can be very big and I can have many threads that want to foreach on it. As it is currently, things can be very slow.
Try using a ConcurrentDictionary<TKey, TValue>, which is designed for this kind of scenario.
There's a nice tutorial here on how to use it.
The big question is: Do you need the foreach to be a snapshot?
If the answer is "no", then use a ConcurrentDictionary and you will probably be fine. (The one remaining question is whether the nature of your inserts and reads hit the striped locks in a bad way, but if that was the case you'd be finding normal reads and writes to the dictionary even worse).
However, because it's GetEnumerator doesn't provide a snapshot, it will not be enumerating the same start at the beginning as it is at the end. It could miss items, or duplicate items. The question is whether that's a disaster to you or not.
If it would be a disaster if you had duplicates, but not otherwise, then you can filter out duplicates with Distinct() (whether keyed on the keys or both the key and value, as required).
If you really need it to be a hard snapshot, then take the following approach.
Have a ConcurrentDictionary (dict) and a ReaderWriterLockSlim (rwls). On both reads and writes obtain a reader lock (yes even though you're writing):
public static void AddToDict(string key, string value)
{
rwls.EnterReadLock();
try
{
dict[key] = value;
}
finally
{
rwls.ExitReadLock();
}
}
public static bool ReadFromDict(string key, out string value)
{
rwls.EnterReadLock();
try
{
return dict.TryGetValue(key, out value);
}
finally
{
rwls.ExitReadLock();
}
}
Now, when we want to enumerate the dictionary, we acquire the write lock (even though we're reading):
public IEnumerable<KeyValuePair<string, string>> EnumerateDict()
{
rwls.EnterWriteLock();
try
{
return dict.ToList();
}
finally
{
rwls.ExitWriteLock();
}
}
This way we obtain the shared lock for reading and writing, because ConcurrentDictionary deals with the conflicts involved in that for us. We obtain the exclusive lock for enumerating, but just for long enough to obtain a snapshot of the dictionary in a list, which is then used only in that thread and not shared with any other.
With .NET 4 you get a fancy new ConcurrentDictionary. I think there are some .NET 3.5-based implementations floating around.
Yes, you will have a problem updating the global dictionary while an enumeration is running in another thread.
Solutions:
Require all users of the dictionary to acquire a mutex lock before accessing the object, and release the lock afterwards.
Use .NET 4.0's ConcurrentDictionary class.
I am confused by a code listing in a book i am reading, C# 3 in a Nutshell, on threading.
In the topic on Thread Safety in Application Servers, below code is given as an example of a UserCache:
static class UserCache
{
static Dictionary< int,User> _users = new Dictionary< int, User>();
internal static User GetUser(int id)
{
User u = null;
lock (_users) // Why lock this???
if (_users.TryGetValue(id, out u))
return u;
u = RetrieveUser(id); //Method to retrieve from databse
lock (_users) _users[id] = u; //Why lock this???
return u;
}
}
The authors explain why the RetrieveUser method is not in a lock, this is to avoid locking the cache for a longer period.
I am confused as to why lock the TryGetValue and the update of the dictionary since even with the above the dictionary is being updated twice if 2 threads call simultaneously with the same unretrieved id.
What is being achieved by locking the dictionary read?
Many thanks in advance for all your comments and insights.
The Dictionary<TKey, TValue> class is not threadsafe.
If one thread writes one key to the dictionary while a different thread reads the dictionary, it may get messed up. (For example, if the write operation triggers an array resize, or if the two keys are a hash collision)
Therefore, the code uses a lock to prevent concurrent writes.
There is a benign race condition when writing to the dictionary; it is possible, as you stated, for two threads to determine there is not a matching entry in the cache. In this case, both of them will read from the DB and then attempt to insert. Only the object inserted by the last thread is kept; the other object will be garbage collected when the first thread is done with it.
The read to the dictionary needs to be locked because another thread may be writing at the same time, and the read needs to search over a consistent structure.
Note that the ConcurrentDictionary introduced in .NET 4.0 pretty much replaces this kind of idiom.
That's a common practice to access any non thread safe structures like lists, dictionaries, common shared values, etc.
And answering main question: locking a read we guarantee that dictionary will not be changed by another thread while we are reading its value. This is not implemented in dictionary and that is why it’s called non thread safe :)
If two threads call in simultaneously and the id exists, then they will both return the correct User information. The first lock is to prevent errors like SLaks said - if someone is writing to the dictionary while you are trying to read it, you'll have issues. In this scenario, the second lock will never be reached.
If two threads call in simultaneously and the id does not exist, one thread will lock and enter TryGetValue, this will return false and set u to a default value. This first lock is again, to prevent the errors described by SLaks. At this point, that first thread will release the lock and the second thread will enter and do the same. Both will then set 'u' to information from 'RetrieveUser(id)'; this should be the same information. One thread will then lock the dictionary and assign _users[id] to the value of u. This second lock is so that two threads are trying to write values to the same memory locations simultaneously and corrupting that memory. I don't know what the second thread will do when it enters the assignment. It will either return early ignoring the update, or overwrite the existing data from the first thread. Regardless, the Dictionary will contain the same information because both threads should have recieved the same data in 'u' from RetrieveUser.
For performance, the auther compared two scenarios - the above scenario, which will be extremely rare and block while two threads try and write the same data, and second one where it is far more likely that two threads call in requesting data for an object that needs written, and one that exists. For example, threadA and threadB call in simultaneously and ThreadA locks for an id that doesn't exist. There is no reason to make threadB wait for a lookup while threadA is working on RetriveUser. This situation is probably far more likely than the duplicate ids described above, so for performance the author chose not to lock on the whole block.