My team is currently debating this issue.
The code in question is something along the lines of
if (!myDictionary.ContainsKey(key))
{
lock (_SyncObject)
{
if (!myDictionary.ContainsKey(key))
{
myDictionary.Add(key,value);
}
}
}
Some of the posts I've seen say that this may be a big NO NO (when using TryGetValue). Yet members of our team say it is ok since "ContainsKey" does not iterate on the key collection but checks if the key is contained via the hash code in O(1). Hence they claim there is no danger here.
I would like to get your honest opinions regarding this issue.
Don't do this. It's not safe.
You could be calling ContainsKey from one thread while another thread calls Add. That's simply not supported by Dictionary<TKey, TValue>. If Add needs to reallocate buckets etc, I can imagine you could get some very strange results, or an exception. It may have been written in such a way that you don't see any nasty effects, but I wouldn't like to rely on it.
It's one thing using double-checked locking for simple reads/writes to a field, although I'd still argue against it - it's another to make calls to an API which has been explicitly described as not being safe for multiple concurrent calls.
If you're on .NET 4, ConcurrentDictionary is probably the way forward. Otherwise, just lock on every access.
If you are in a multithreaded environment, you may prefer to look at using a ConcurrentDictionary. I blogged about it a couple of months ago, you might find the article useful: http://colinmackay.co.uk/blog/2011/03/24/parallelisation-in-net-4-0-the-concurrent-dictionary/
This code is incorrect. The Dictionary<TKey, TValue> type does not support simultaneous read and write operations. Even though your Add method is called within the lock the ContainsKey is not. Hence it easily allows for a violation of the simultaneous read / write rule and will lead to corruption in your instance
It doesn't look thread-safe, but it would probably be hard to make it fail.
The iteration vs hash lookup argument doesn't hold, there could be a hash-collision for instance.
If this dictionary is rarely written and often read, then I often employ safe double locking by replacing the entire dictionary on write. This is particularly effective if you can batch writes together to make them less frequent.
For example, this is a cut down version of a method we use that tries to get a schema object associated with a type, and if it can't, then it goes ahead and creates schema objects for all the types it finds in the same assembly as the specified type to minimize the number of times the entire dictionary has to be copied:
public static Schema GetSchema(Type type)
{
if (_schemaLookup.TryGetValue(type, out Schema schema))
return schema;
lock (_syncRoot) {
if (_schemaLookup.TryGetValue(type, out schema))
return schema;
var newLookup = new Dictionary<Type, Schema>(_schemaLookup);
foreach (var t in type.Assembly.GetTypes()) {
var newSchema = new Schema(t);
newLookup.Add(t, newSchema);
}
_schemaLookup = newLookup;
return _schemaLookup[type];
}
}
So the dictionary in this case will be rebuilt, at most, as many times as there are assemblies with types that need schemas. For the rest of the application lifetime the dictionary accesses will be lock-free. The dictionary copy becomes a one-time initialization cost of the assembly. The dictionary swap is thread-safe because pointer writes are atomic so the whole reference gets switched at once.
You can apply similar principles in other situations as well.
Related
I have a collection as below
private static readonly Dictionary<string,object> _AppCache = new Dictionary<string,object>;
Then I was wondering which one is better to use to check if a key exists (none of my keys has null value)
_AppCache.ContainsKey("x")
_AppCache["x"] != null
This code might be access through various number of threads
The whole code is:
public void SetGlobalObject(string key, object value)
{
globalCacheLock.EnterWriteLock();
try
{
if (!_AppCache.ContainsKey(key))
{
_AppCache.Add(key, value);
}
}
finally
{
globalCacheLock.ExitWriteLock();
}
}
Update
I changed my code to use dictionary to keep focus of the question on Conatinskey or Indexer
I don't disagree with other's advice to use Dictionary. However, to answer your question, I think you should use ContainsKey to check if a key exists for several reasons
That is specifically what ContainsKey was written to do
For _AppCache["x"] != null to work your app must operate under an unenforced assumption (that no values will be null). That assumption may hold true now, but future maintainers may not know or understand this critical assumption, resulting in unintuitive bugs
Slightly less processing for ContainsKey, although this is not really important
Neither of the two choices are threadsafe, so that is not a deciding factor. For that, you either need to use locking, or use ConcurrentDictionary.
If you move to a Dictionary (per your question update), the answer is even more in favor of ContainsKey. If you used the index option, you would have to catch an exception to detect if the key is not in the Dictionary. ContainsKey would be much more straightforward in your code.
When the key is in the Dictionary, ContainsKey is slightly more efficient. Both options first call an internal method FindEntry. In the case of ContainsKey, it just returns the result of that. For the index option, it must also retrieve the value. In the case of the key not being in the Dictionary, the index option would be a fair amount less efficient, because it will be throwing an exception.
You are obviously checking for the existence of that key. In that case, _AppCache["x"] != null will give you a KeyNotFoundException if the key does not exist, which is probably not as desirable. If you really want to check if the key exists, without generating an exception by just checking, you have to use _AppCache.ContainsKey("x"). For checking if the key exists in the dictionary or hashtable, I would stick with ContainsKey. Any difference in performance, if != null is faster, would be offset by the additional code to deal with the exception if the key really does not exist.
In reality, _AppCache["x"] != null is not checking if the key exists, it is checking, given that key "x" exists, whether the associated value is null.
Neither way (although accomplishing different tasks) gives you any advantage on thread safety.
All of this holds true if you use ConcurrentDictionary - no difference in thread safety, the two ways accomplish different things, any possible gain in checking with !=null is offset by additional code to handle exception. So, use ContainsKey.
If you're concerned about thread-safety, you should have a look at the ConcurrentDictionary class.
If you do not want to use ConcurrentDictionary, than you'll have to make sure that you synchronize access to your regular Dictionary<K,V> instance. That means, making sure that no 2 threads can have multiple access to your dictionary, by locking on each write and read operation.
For instance, if you want to add something to a regular Dictionary in a thread-safe way, you'll have to do it like this:
private readonly object _sync = new object();
// ...
lock( _sync )
{
if( _dictionary.ContainsKey(someKey) == false )
{
_dictionary.Add(someKey, somevalue);
}
}
You should'nt be using using Hashtable anymore since the introduction of the generic Dictionary<K,V> class and therefore type-safe alternative has been introduced in .NET 2.0
One caveat though when using a Dictionary<K,V>: when you want to retrieve the value associated with a given key, the Dictionary will throw an exception when there is no entry for that specified key, whereas a Hashtable will return null in that case.
You should use a ConcurrentDictionary rather than a Dictionary, which is thread-safe itself. Therefore you do not need the lock, which (generally *) improves performance, since the locking mechanisms are rather expensive.
Now, only to check whether an entry exists I recommend ContainsKey, irrespective of which (Concurrent)Dictionary you use:
_AppCache.ContainsKey(key)
But what you do in two steps can be done in one step using the Concurrent Dictionary by using GetOrAdd:
_AppCache.GetOrAdd(key, value);
You need a lock for neither action:
public void SetGlobalObject(string key, object value)
{
_AppCache.GetOrAdd(key, value);
}
Not only does this (probably *) perform better, but I think it expresses your intentions much clearer and less cluttered.
(*) Using "probably" and "generally" here to emphasise that these data structures do have loads of baked-in optimisations for performance, however performance in your specific case must always be measured.
Are the following assumptions valid for this code? I put some background info under the code, but I don't think it's relevant.
Assumption 1: Since this is a single application, I'm making the assumption it will be handled by a single process. Thus, static variables are shared between threads, and declaring my collection of lock objects statically is valid.
Assumption 2: If I know the value is already in the dictionary, I don't need to lock on read. I could use a ConcurrentDictionary, but I believe this one will be safe since I'm not enumerating (or deleting), and the value will exist and not change when I call UnlockOnValue().
Assumption 3: I can lock on the Keys collection, since that reference won't change, even if the underlying data structure does.
private static Dictionary<String,Object> LockList =
new Dictionary<string,object>();
private void LockOnValue(String queryStringValue)
{
lock(LockList.Keys)
{
if(!LockList.Keys.Contains(queryStringValue))
{
LockList.Add(screenName,new Object());
}
System.Threading.Monitor.Enter(LockList[queryStringValue]);
}
}
private void UnlockOnValue(String queryStringValue)
{
System.Threading.Monitor.Exit(LockList[queryStringValue]);
}
Then I would use this code like:
LockOnValue(Request.QueryString["foo"])
//Check cache expiry
//if expired
//Load new values and cache them.
//else
//Load cached values
UnlockOnValue(Request.QueryString["foo"])
Background: I'm creating an app in ASP.NET that downloads data based on a single user-defined variable in the query string. The number of values will be quite limited. I need to cache the results for each value for a specified period of time.
Approach: I decided to use local files to cache the data, which is not the best option, but I wanted to try it since this is non-critical and performance is not a big issue. I used 2 files per option, one with the cache expiry date, and one with the data.
Issue: I'm not sure what the best way to do locking is, and I'm not overly familiar with threading issues in .NET (one of the reasons I chose this approach). Based on what's available, and what I read, I thought the above should work, but I'm not sure and wanted a second opinion.
Your current solution looks pretty good. The two things I would change:
1: UnlockOnValue needs to go in a finally block. If an exception is thrown, it will never release its lock.
2: LockOnValue is somewhat inefficient, since it does a dictionary lookup twice. This isn't a big deal for a small dictionary, but for a larger one you will want to switch to TryGetValue.
Also, your assumption 3 holds - at least for now. But the Dictionary contract makes no guarantee that the Keys property always returns the same object. And since it's so easy to not rely on this, I'd recommend against it. Whenever I need an object to lock on, I just create an object for that sole purpose. Something like:
private static Object _lock = new Object();
lock only has a scope of a single process. If you want to span processes you'll have to use primitives like Mutex (named).
lock is the same as Monitor.Enter and Monitor.Exit. If you also do Monitor.Enter and Monitor.Exit, it's being redundant.
You don't need to lock on read, but you do have to lock the "transaction" of checking if the value doesn't exist and adding it. If you don't lock on that series of instructions, something else could come in between when you check for the key and when you add it and add it--thus resulting in an exception. The lock you're doing is sufficient to do that (you don't need the additional calls to Enter and Exit--lock will do that for you).
I'm trying to optimise the performance of a string comparison operation on each string key of a dictionary used as a database query cache. The current code looks like:
public void Clear(string tableName)
{
foreach (string key in cache.Keys.Where(key => key.IndexOf(tableName, StringComparison.Ordinal) >= 0).ToList())
{
cache.Remove(key);
}
}
I'm new to using C# parallel features and am wondering what the best way would be to convert this into a parallel operation so that multiple string comparisons can happen 'simultaneously'. The cache can often get quite large so maintenance on it with Clear() can get quite costly.
Make your cache object a ConcurrentDictionary and use TryRemove instead of Remove.
This will make your cache thread-safe; then, can invoke your current foreach loop like this:
Parallel.ForEach(cache.Keys, key =>
{
if(key.IndexOf(tableName, StringComparison.Ordinal) >= 0)
{
dynamic value; // just because I don't know your dictionary.
cache.TryRemove(key, out value);
}
});
Hope that gives you an starting point.
Your approach can't work well on a Dictionary<string, Whatever> because that class isn't thread-safe for multiple writers, so the simultaneous deletes could cause all sorts of problems.
You will therefore have to use a lock to synchronise the removals, which will therefore make the access of the dictionary essentially single-threaded. About the only thing that can be safely done across the threads simultaneously is the comparison in the Where.
You could use ConcurrentDictionary because its use of striped locks will reduce this impact. It still doesn't seem the best approach though.
If you are building keys from a strings so that testing if the key starts with a sub-key, and if removing the entire subkey is a frequent need, then you could try using a Dictionary<string, Dictionary<string, Whatever>>. Adding or updating becomes a bit more expensive, but clearing becomes an O(1) removal of just the one value from the higher-level dictionary.
I've used Dictionaries as caches before and what I've used to do is to do the clean up the cache "on the fly", that is, with each entry I also include its time of inclusion, then anytime an entry is requested I remove the old entries. Performance hit was minimal to me but if needed you could implement a Queue (of Tuple<DateTime, TKey> where TKey is the type of the keys on your dictionary) as an index to hold these timestamps so you didn't need to iterate over the entire dictionary every time. Anyway, if you're having to think about these issues, it's time to consider using a dedicated caching server. To me, Shared Cache (http://sharedcache.codeplex.com) has been good enough.
I've written a wrapper class around a 3rd party library that requires properties to be set by calling a Config method and passing a string formatted as "Property=Value"
I'd like to pass all the properties in a single call and process them iteratively.
I've considered the following:
creating a property/value class and then creating a List of these
objects
building a string of multiple "Property=Value" separating them
with a token (maybe "|")
Using a hash table
All of these would work (and I'm thinking of using option 1) but is there a better way?
A bit more detail about my query:
The finished class will be included in a library for re-use in other applications. Whilst I don't currently see threading as a problem at the moment (our apps tend to just have a UI thread and a worker thread) it could become an issue in the future.
Garbage collection will not be an issue.
Access to arbitrary indices of the data source is not currently an issue.
Optimization is not currently an issue but clearly define the key/value pairs is important.
As you've already pointed out, any of the proposed solutions will accomplish the task as you've described it. What this means is that the only rational way to choose a particular method is to define your requirements:
Does your code need to support multiple threads accessing the data source simultaneously? If so, using a ConcurrentDictionary, as Yahia suggested, makes sense. Otherwise, there's no reason to incur the additional overhead and complexity of using a concurrent data structure.
Are you working in an environment where garbage collection is a problem (for example, an XNA game)? If so, any suggestion involving the concatenation of strings is going to be problematic.
Do you need O(1) access to arbitrary indices of the data source? If so, your third approach makes sense. On the other hand, if all you're doing is iterating over the collection, there's no reason to incur the additional overhead of inserting into a hashtable; use a List<KeyValuePair<String, String>> instead.
On the other hand, you may not be working in an environment where the optimization described above is necessary; the ability to clearly define the key/value pairs programatically may be more important to you. In which case using a Dictionary is a better choice.
You can't make an informed decision as to how to implement a feature without completely defining what the feature needs to do, and since you haven't done that, any answer given here will necessarily be incomplete.
Given your clarifications, I would personally suggest the following:
Avoid making your Config() method thread-safe by default, as per the MSDN guidelines:
By default, class libraries should not be thread safe. Adding locks to create thread-safe code decreases performance, increases lock contention, and creates the possibility for deadlock bugs to occur.
If thread safety becomes important later, make it the caller's responsibility.
Given that you don't have special performance requirements, stick with a dictionary to allow key/value pairs to be easily defined and read.
For simplicity's sake, and to avoid generating lots of unnecessary strings doing concatenations, just pass the dictionary in directly and iterate over it.
Consider the following example:
var configData = new Dictionary<String, String>
configData["key1"] = "value1";
configData["key2"] = "value2";
myLibraryObject.Config(configData);
And the implementation of Config:
public void Config(Dictionary<String, String> values)
{
foreach(var kvp in values)
{
var configString = String.Format("{0}={1}", kvp.Key, kvp.Value);
// do whatever
}
}
You could use Dictionary<string,string>, the items are then of type KeyValuePair<string,string> (this correpsonds to your first idea)
You can then use myDict.Select(kvp=>string.Format("{0}={1}",kvp.Key,kvp.Value)) to get a list of strings with the needed formatting
Use for example a ConcurrentDictionary<string,string> - it is thread-safe and really fast since most operations are implemented lock-free...
You could make a helper class that uses reflection to turn any class into a Property=Value collection
public static class PropertyValueHelper
{
public static IEnumerable<string> GetPropertyValues(object source)
{
Type t = source.GetType();
foreach (var property in t.GetProperties())
{
object value = property.GetValue(source, null);
if (value != null)
{
yield return property.Name + "=" + value.ToString();
}
else
{
yield return property.Name + "=";
}
}
}
}
You would need to add extra logic to handle enumerations, indexed properties, etc.
I need to enumerate though generic IList<> of objects. The contents of the list may change, as in being added or removed by other threads, and this will kill my enumeration with a "Collection was modified; enumeration operation may not execute."
What is a good way of doing threadsafe foreach on a IList<>? prefferably without cloning the entire list. It is not possible to clone the actual objects referenced by the list.
Cloning the list is the easiest and best way, because it ensures your list won't change out from under you. If the list is simply too large to clone, consider putting a lock around it that must be taken before reading/writing to it.
There is no such operation. The best you can do is
lock(collection){
foreach (object o in collection){
...
}
}
Your problem is that an enumeration does not allow the IList to change. This means you have to avoid this while going through the list.
A few possibilities come to mind:
Clone the list. Now each enumerator has its own copy to work on.
Serialize the access to the list. Use a lock to make sure no other thread can modify it while it is being enumerated.
Alternatively, you could write your own implementation of IList and IEnumerator that allows the kind of parallel access you need. However, I'm afraid this won't be simple.
ICollection MyCollection;
// Instantiate and populate the collection
lock(MyCollection.SyncRoot) {
// Some operation on the collection, which is now thread safe.
}
From MSDN
You'll find that's a very interesting topic.
The best approach relies on the ReadWriteResourceLock which use to have big performance issues due to the so called Convoy Problem.
The best article I've found treating the subject is this one by Jeffrey Richter which exposes its own method for a high performance solution.
So the requirements are: you need to enumerate through an IList<> without making a copy while simultaniously adding and removing elements.
Could you clarify a few things? Are insertions and deletions happening only at the beginning or end of the list?
If modifications can occur at any point in the list, how should the enumeration behave when elements are removed or added near or on the location of the enumeration's current element?
This is certainly doable by creating a custom IEnumerable object with perhaps an integer index, but only if you can control all access to your IList<> object (for locking and maintaining the state of your enumeration). But multithreaded programming is a tricky business under the best of circumstances, and this is a complex probablem.
Forech depends on the fact that the collection will not change. If you want to iterate over a collection that can change, use the normal for construct and be prepared to nondeterministic behavior. Locking might be a better idea, depending on what you're doing.
Default behavior for a simple indexed data structure like a linked list, b-tree, or hash table is to enumerate in order from the first to the last. It would not cause a problem to insert an element in the data structure after the iterator had already past that point or to insert one that the iterator would enumerate once it had arrived, and such an event could be detected by the application and handled if the application required it. To detect a change in the collection and throw an error during enumeration I could only imagine was someone's (bad) idea of doing what they thought the programmer would want. Indeed, Microsoft has fixed their collections to work correctly. They have called their shiny new unbroken collections ConcurrentCollections (System.Collections.Concurrent) in .NET 4.0.
I recently spend some time multip-threading a large application and had a lot of issues with the foreach operating on list of objects shared across threads.
In many cases you can use the good old for-loop and immediately assign the object to a copy to use inside the loop. Just keep in mind that all threads writing to the objects of your list should write to different data of the objects. Otherwise, use a lock or a copy as the other contributors suggest.
Example:
foreach(var p in Points)
{
// work with p...
}
Can be replaced by:
for(int i = 0; i < Points.Count; i ++)
{
Point p = Points[i];
// work with p...
}
Wrap the list in a locking object for reading and writing. You can even iterate with multiple readers at once if you have a suitable lock, that allows multiple concurrent readers but also a single writer (when there are no readers).
This is something that I've recently had to deal with and to me it really depends on what you're doing with the list.
If you need to use the list at a point in time (given the number of elements currently in it) AND another thread can only ADD to the end of the list, then maybe you just switch out to a FOR loop with a counter. At the point you grab the counter, you're only seeing X numbers of elements in the list. You can walk through the list (while others are adding to the end of it) . . . should not cause a problem.
Now, if the list needs to have items taken OUT of it by other threads, or CLEARED by other threads, then you'll need to implement one of the locking mechanisms mentioned above. Also, you may want to look at some of the newer "concurrent" collection classes (though I don't believe they implement IList - so you may need refactor for a dictionary).