We're on a high traffic website where we want to do some very simple caching for a calculation using a ConcurrentDictionary, to prevent it from being done for every request. The number of possible inputs is limited enough and the calculation relatively heavy (splitting and rejoining a string).
I'm considering code like
string result;
if (!MyConCurrentDictionary.TryGetValue(someKey, out result))
{
result = DoCalculation(someKey);
// alternative 1: use Item property
MyConcurrentDictionary[someKey] = result;
//alternative 2: use TryAdd
MyConcurrentDictionary.TryAdd(someKey, result);
}
My question is: which alternative is the best choice from a performance perspective?
Your code is completely broken; you're assuming that nothing will change between those two lines.
You need to use .GetOrAdd().
In general, when dealing with a mutable concurrent object, you must never make multiple calls to the object for the same thing, since its state can change at any time.
As #SLaks indicated, your code has a race condition. ConcurrentDictionary was built to prevent such scenarios by providing methods that perform complex operations atomically, such as GetOrAdd and AddOrUpdate.
Here's a sequence that describes how your code could break:
Thread 1 executes TryGetValue which returns false
Thread 2 does the same
Both threads do the calculation
Thread 2 adds the value to the dictionary and returns the result
Thread 1 either
(using the indexer setter) adds another value, overwriting the previous one and returns it
(using TryAdd) doesn't add it and the method returns a result that's potentially different than the one in the dictionary
Thread 2 now has a result that's potentially different than the one in the dictionary
So you can see how not using GetOrAdd makes things a lot more complex and could have potentially disastrous results. It's possible that in your case if you're deterministically calculating a string from the key it won't matter - but there's really no reason not to use this method. It also simplifies the code, and may (marginally) improve performance since it calculates the hash-code just once.
Another conclusion is choosing between the indexer and TryAdd is much more a question of correctness than of performance.
That said, the performance of both the indexer [] and TryAdd methods is identical, since both share the same implementation (TryAddInternal), which works by taking the lock belonging to the key's bucket, then checking if the key exists in the dictionary, and either updating the dictionary or not.
Lastly, here's an example that shows how to build methods like GetOrAdd correctly.
Related
I am in need of a data type that is able to insert entries and then be able to quickly determine if an entry has already been inserted. A Dictionary seems to suit this need (see example). However, I have no use for the dictionary's values. Should I still use a dictionary or is there another better suited data type?
public class Foo
{
private Dictionary<string, bool> Entities;
...
public void AddEntity(string bar)
{
if (!Entities.ContainsKey(bar))
{
// bool value true here has no use and is just a placeholder
Entities.Add(bar, true);
}
}
public string[] GetEntities()
{
return Entities.Keys.ToArray();
}
}
You can use HashSet<T>.
The HashSet<T> class provides high-performance set operations. A set
is a collection that contains no duplicate elements, and whose
elements are in no particular order.
Habib's answer is excellent, but for multi-threaded environments if you use a HashSet<T> then by consequence you have to use locks to protect access to it. I find myself more prone to creating deadlocks with lock statements. Also, locks yield a worse speedup per Amdahl's law because adding a lock statement reduces the percentage of your code that is actually parallel.
For those reasons, a ConcurrentDictionary<T,object> fits the bill in multi-threaded environments. If you end up using one, then wrap it like you did in your question. Just new up objects to toss in as values as needed, since the values won't be important. You can verify that there are no lock statements in its source code.
If you didn't need mutability of the collection then this would be moot. But your question implies that you do need it, since you have an AddEntity method.
Additional info 2017-05-19 - actually, ConcurrentDictionary does use locks internally, although not lock statements per se--it uses Monitor.Enter (check out the TryAddInternal method). However, it seems to lock on individual buckets within the dictionary, which means there will be less contention than putting the entire thing in a lock statement.
So all in all, ConcurrentDictionary is often better for multithreaded environments.
It's actually quite difficult (impossible?) to make a concurrent hash set using only the Interlocked methods. I tried on my own and kept running into the problem of needing to alter two things at the same time--something that only locking can do in general. One workaround I found was to use singly-linked lists for the hash buckets and intentionally create cycles in a list when one thread needed to operate on a node without interference from other threads; this would cause other threads to get caught spinning around in the same spot until that thread was done with its node and undid the cycle. Sure, it technically didn't use locks, but it did not scale well.
I have a dictionary with a fixed collection of keys, which I create at the beginning of the program. Later, I have some threads updating the dictionary with values.
No pairs are added or removed once the threads started.
Each thread has its own key. meaning, only one thread will access a certain key.
the thread might update the value.
The question is, should I lock the dictionary?
UPDATE:
Thanks all for the answers,
I tried to simplify the situation when i asked this question, just to understand the behaviour of the dictionary.
To make myself clear, here is the full version:
I have a dictionary with ~3000 entries (fixed keys), and I have more than one thread accessing the key (shared resourse), but I know for a fact that only one thread is accessing a key entry at a time.
so, should I lock the dictionary? and - when you have the full version now, is a dictionary the right choise at all?
Thanks!
FROM MSDN
A Dictionary can support multiple readers concurrently, as long as the collection is not modified.
To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization.
For a thread-safe alternative, see ConcurrentDictionary<TKey, TValue>.
Let's deal with your question one interpretation at a time.
First interpretation: Given how Dictionary<TKey, TValue> is implemented, with the context I've given, do I need to lock the dictionary?
No, you don't.
Second interpretation: Given how Dictionary<TKey, TValue is documented, with the context I've given, do I need to lock the dictionary?
Yes, you definitely should.
There is no guarantee that the access, which might be OK today, will be OK tomorrow, in a multithreaded world since the type is documented as not threadsafe. This allows the programmers to make certain assumptions about the state and integrity of the type that they would otherwise have to build in guarantees for.
A hotfix or update to .NET, or a whole new version, might change the implementation and make it break and this is your fault for relying on undocumented behavior.
Third interpretation: Given the context I've given, is a dictionary the right choice?
No it isn't. Either switch to a threadsafe type, or simply don't use a dictionary at all. Why not just use a variable per thread instead?
Conclusion: If you intend to use the dictionary, lock the dictionary. If it's OK to switch to something else, do it.
Use a ConcurrentDictionary, don't reinvent the wheel.
Better still, refactor your code to avoid this unecessary contention.
If there is no communication between the threads you could just do something like this:
assuming a function that changes a value.
private static KeyValuePair<TKey, TValue> ValueChanger<TKey, TValue>(
KeyValuePair<TKey, TValue> initial)
{
// I don't know what you do so, i'll just return the value.
return initial;
}
lets say you have some starting data,
var start = Enumerable.Range(1, 3000)
.Select(i => new KeyValuePair<int, object>(i, new object()));
you could process them all at once like this,
var results = start.AsParallel().Select(ValueChanger);
when, results is evaluated, all 3000 ValueChangers will run concurrently, yielding a IEnumerable<KeyValuePair<int, object>>.
There will be no interaction between the threads, thus no possible concurrency problems.
If you want to turn the results into a Dictionary you could,
var resultsDictionary = results.ToDictionary(p => p.Key, p => p.Value);
This may or may not be useful in your situation but, without more detail its hard to say.
If each thread access only one "value" and if you dont care about others I'll say you dont need a Dictionary at all. You can use ThreadLocal or ThreadStatic variables.
If at all you need a Dictionary you definitely need a lock.
If you're in .Net 4.0 or above I'll strongly suggest you to use ConcurrentDictionary, you don't need to synchronize access when using ConcurrentDictionary because it is already "ThreadSafe".
The Diectionary is not thread safe but in your code you do not have to do that; you said one thread update one value so you do not have multi threading problem!
I do not have the code so I'm not sure 100%.
Also check this :Making dictionary access thread-safe?
If you're not adding keys, but simply modifying values, why not completely remove the need for writing directly to the Dictionary by storing complex objects as the value and modifying a value within the complex type. That way, you respect the thread safety constraints of the dictionary.
So:
class ValueWrapper<T>
{
public T Value{get;set;}
}
//...
var myDic = new Dictionary<KeyType, ValueWrapper<ValueType>>();
//...
myDic[someKey].Value = newValue;
You're now not writing directly to the dictionary but you can modify values.
Don't try to do the same with keys. Necessarily, they should be immutable
Given the constraint "I know for a fact that only one thread is accessing a key entry at a time", I don't think you have any problem.
Possible modifications of a Dictionary are: add, update and remove.
If the Dictionary is modified or allowed to be modified, you must use a synchronization mechanism of choice to eliminate the potential race condition, in which one thread reads the old dirty value while a second thread is currently replacing the value/updating the key.
To safe you some work, use the ConcurentDictionary in this scenario.
If the Dictionary is never modified after creation, there won't be any race conditions. A synchronization is therefore not required.
This is a special scenario in which you can replace the table with a read-only table. To add the important robustness, like guarding against potential bugs by accidentally manipulating the table, you should make the Dictionary immutable (or read-only). To give the developer compiler support, such an immutable implementation must throw an exception on any manipulation attempts.
To safe you some work, you can use the ReadOnlyDictionary in this scenario. Note that the underlying Dictionary of the ReadOnlyDictionary is still mutable and that its changes are propagated to the ReadOnlyDictionary facade. The ReadOnlyDictionary only helps to ensure that the table is not accidentally modified by its consumers.
This means: Dictionary is never an option in a multithreaded context.
Rather use the ConcurrentDictionary or a synchronization mechanism in general (or use the ReadOnlyDictionary if you can guarantee that the original source collection never changes).
Since you allow and expect manipulations of the table ("[...] the thread might update the value"), you must use a synchronization mechanism of choice or the ConcurrentDictionary.
If I have a ConcurrentDictionary and use the TryGetValue within an if statement, does this make the if statement's contents thread safe? Or must you lock still within the if statement?
Example:
ConcurrentDictionary<Guid, Client> m_Clients;
Client client;
//Does this if make the contents within it thread-safe?
if (m_Clients.TryGetValue(clientGUID, out client))
{
//Users is a list.
client.Users.Add(item);
}
or do I have to do:
ConcurrentDictionary<Guid, Client> m_Clients;
Client client;
//Does this if make the contents within it thread-safe?
if (m_Clients.TryGetValue(clientGUID, out client))
{
lock (client)
{
//Users is a list.
client.Users.Add(item);
}
}
Yes you have to lock inside the if statement the only guarantee you get from concurrent dictionary is that its methods are thread save.
The accepted answer could be misleading, depending on your point of view and the scope of thread safety you are trying to achieve. This answer is aimed at people who stumble on this question while learning about threading and concurrency:
It's true that locking on the output of the dictionary retrieval (the Client object) makes some of the code thread safe, but only the code that is accessing that retrieved object within the lock. In the example, it's possible that another thread removes that object from the dictionary after the current thread retrieves it. (Even though there are no statements between the retrieval and the lock, other threads can still execute in between.) Then, this code would add the Client object to the Users list even though it is no longer in the concurrent dictionary. That could cause an exception, synchronization, or race condition.
It depends on what the rest of the program is doing. But in the scenario I'm describing, it would be safer to put the lock around the entire dictionary retrieval. And then a regular dictionary might be faster and simpler than a concurrent dictionary, as long as you always lock on it while using it!
While both of the current answers are technically true I think that the potential exists for them to be a little misleading and they don't express ConcurrentDictionary's big strengths. Maybe the OP's original way of solving the problem with locks worked in that specific circumstance but this answer is aimed more generally towards people learning about ConcurrentDictionary for the first time.
Concurrent Dictionary is designed so that you don't have to use locks. It has several specialty methods designed around the idea that some other thread could modify the object in the dictionary while you're currently working on it. For a simple example, the TryUpdate method lets you check to see if a key's value has changed between when you got it and the moment that you're trying to update it. If the value that you've got matches the value currently in the ConcurrentDictionary you can update it and TryUpdate returns true. If not, TryUpdate returns false. The documentation for the TryUpdate method can make this a little confusing because it doesn't make it explicitly clear why there is a comparison value but that's the idea behind the comparison value. If you wanted to have a little more control around adding or updating, you could use one of the overloads of the AddOrUpdate method to either add a value for a key if it doesn't exist at the moment that you're trying to add it or update the value if some other thread has already added a value for the key that is specified. The context of whatever you're trying to do will dictate the appropriate method to use. The point is that, rather than locking, try taking a look at the specialty methods that ConcurrentDictionary provides and prefer those over trying to come up with your own locking solution.
In the case of OP's original question, I would suggest that instead of this:
ConcurrentDictionary<Guid, Client> m_Clients;
Client client;
//Does this if make the contents within it thread-safe?
if (m_Clients.TryGetValue(clientGUID, out client))
{
//Users is a list.
client.Users.Add(item);
}
One might try the following instead*:
ConcurrentDictionary<Guid, Client> m_Clients;
Client originalClient;
if(m_Clients.TryGetValue(clientGUID, out originalClient)
{
//The Client object will need to implement IEquatable if more
//than an object instance comparison needs to be done. This
//sample code assumes that Client implements IEquatable.
//If copying a Client is not trivial, you'll probably want to
//also implement a simple type of copy in a method of the Client
//object. This sample code assumes that the Client object has
//a ShallowCopy method to do this copy for simplicity's sake.
Client modifiedClient = originalClient.ShallowCopy();
//Make whatever modifications to modifiedClient that need to get
//made...
modifiedClient.Users.Add(item);
//Now update the value in the ConcurrentDictionary
if(!m_Clients.TryUpdate(clientGuid, modifiedClient, originalClient))
{
//Do something if the Client object was updated in between
//when it was retrieved and when the code here tries to
//modify it.
}
}
*Note in the example above, I'm using TryUpate for ease of demonstrating the concept. In practice, if you need to make sure that an object gets added if it doesn't exist or updated if it does, the AddOrUpdate method would be the ideal option because the method handles all of the looping required to check for add vs update and take the appropriate action.
It might seem like it's a little harder at first because it may be necessary to implement IEquatable and, depending on how instances of Client need to be copied, some sort of copying functionality but it pays off in the long run if you're working with ConcurrentDictionary and objects within it in any serious way.
I like the lock-free operation of the ConcurrentDictionary and use it in two objects:
ConcurrentDictionary<datetime,myObj> myIndexByDate
ConcurrentDictionary<myObjSummary, ConcurrentDictionary<int, myObj> myObjectSummary Index
These two objects need to stay in Sync. Is the only way to do this is to use a Lock, thus avoiding all benefits of the Concurrent dictionary?
I would create a custom class with 2 dictionaries and use a lock only on the methods which are susceptible to change the dictionary(Add, Delete).
You don't lose the benefits of the concurrent dictionary as this method require much less code than what you would have to do using normal dictionary.
ConcurrentDictionary is only "thread-safe" on the operations on a particular instance. e.g. while a ConcurrentDictionary.TryAdd() call is being invoked, no other threads can be modifying the collection...
This doesn't mean that while you get an value from one dictionary and add it to another dictionary that the value still exists in the original dictionary while you're adding it to the second.
You probably have an invariant that details that while you're moving one value from one dictionary to the other, no values in the original dictionary can be removed (or at least that value, but that's a little more difficult to guarantee with ConncurrentDictionary.
I understand that C# dictionaries are not thread-safe when it comes to adding, reading and removing elements; however, can you access the Count Property of a C# dictionary in a thread safe way if another thread is writing, reading, and removing from the dictionary?
Since property is a method call under the hood hood so really situation is not such simple as at first glance.
T1: Accessing Count property
T1: get_Count() call (some kind of JMP/GOTO ASM instruction)
T1: read variable which represents a number of items == 1
T2: addign a new item, real count becomes 2
T1: returns 1 but really there are already two items
So if an application logic relies on a Count property value - theoretically you could ends up with race condition.
It should be threadsafe in that it won't blow up, but I'm fairly sure it is not threadsafe in that it might not give you the correct count due to other threads manipulating the dictionary.
[Edit: as usr points out, just because it's currently threadsafe at this level does not mean that it will continue to be so. You have no guarantees]
First: This is dangerous thinking. Be very careful if you put such a thing into production. Probably, you should just use a lock or ConcurrentDictionary.
Are you comfortable with an approximate answer?
But: Reflector shows that count just reads some field and returns it. This is likely not to change forever. So you reasonably could take the risk in this case.
You could create the dictionary as a static readonly object, and lock it when modifying it... that should deal with most problems, I would think.
Depending on what you want to achieve, you can protect the dictionary with a ReaderWriterLockSlim. This way you can have faster reads and only lock the dictionary when performing mutating operations.
Rather than assuming, I went and looked at it in reflector:
public int Count
{
get
{
// count and freeCount are local fields
return (this.count - this.freeCount);
}
}
So yes, it is "thread safe" in that accessing it will not cause corruption in the dictionary object. That said, I am not sure that I can think of any use case where getting the count with any accuracy is important if the dictionary is being accessed by another thread.