Threading and IEnumerable; "Collection Was Modified"-exception - c#

Imagine that in the class below, one thread gets the IEnumerable-object and starts iteration over the elements. While in the middle of the iteration, another thread comes along and add a new entry to the library_entries via the Add-method. Would a "Collection was modified"-exception be thrown in the iteration? Or would the lock prevent the adding of the element until the iteration is complete? Or neither?
Thanks!
public static class Library
{
private static List<string> library_entries = new List<string>(1000000);
public static void Add(string entry)
{
lock (library_entries)
library_entries.Add(entry);
}
public static IEnumerable<string> GetEntries()
{
return library_entries.Where(entry => !string.IsNullOrEmpty(entry));
}
}

No, you won't get the exception, you are using a Linq query. It is much worse, it will fail unpredictably. The most typical outcome is that the same item gets enumerated twice, although anything is possible when the List re-allocates its internal storage during the Add() call, including an IndexOutOfRangeException. Once a week, give or take.
The code that calls GetEntries() and consumes the enumerator must acquire the lock as well. Locking the Where() expression is not good enough. Unless you create a copy of the list.

locking wouldn't help at all, since the iteration doesn't use locking. I recommend rewriting the GetEntries() function to return a copy.
public static IEnumerable<string> GetEntries()
{
lock(lockObj)
{
return library_entries.Where(entry => !string.IsNullOrEmpty(entry)).ToList();
}
}
Note this returns a consistent snapshots. i.e. it won't return newly added objects when you're iterating.
And I prefer to lock on a private object whose only purpose is locking, but since the list is private it's no real problem, just a stylistic issue.
You could also write your own iterator like this:
int i=0;
bool MoveNext()
{
lock(lockObj)
{
if(i<list.Count)
return list[i];
i++;
}
}
If that's a good idea depends on your access patterns, lock contention, size of the list,... You might also want to use a Read-Write lock to avoid contention from many read accesses.

The static GetEntries method doesn't perform any lock on the static library_entries collection => it is not thread-safe and any concurrent calls on it from multiple threads might break. The fact that you have locked the Add method is fine but enumerating is not a thread safe operation so you must lock on it as well if you intend to call the GetEntries method concurrently. Also because this method returns an IEnumerable<T> it won't do anything on the actual list until you start enumerating which could be outside of the GetEntries method. So you could add a .ToList() call at the end of the LINQ chaining and thenlock the entire operation.

Yes, an exception would be thrown - you aren't locking on a common object. Also, a lock in the GetEntries method would be useless, as that call returns instantly. The locking would've to occur while iterating.

Related

Lock and refresh implementation

I have a key to task mapping and I need to run the task only if the task for the given is not already running. Pseudo code follows. I believe there is lot of scope for improvement. I'm locking on the map and hence almost serializing access to CacheFreshener. Is there a better way of doing this? We know that when I'm trying to lock a key k1, there is no point in cache freshener call for key k2 waiting for lock.
class CacheFreshener
{
private ConcurrentDictionary<string,bool> lockMap;
public RefreshData(string key, Func<string, bool> cacheMissAction)
{
lock(lockMap)
{
if (lockMap.ContainsKey(key))
{
// no-op
return;
}
else
{
lockMap.Add(key, true);
}
}
// if you are here means task is not already present
cacheMissAction(key);
lock(lockMap) // Do we need to lock here??
{
lockMap.Remove(key);
}
}
}
As requested, here is an elaborated explanation of what I was getting at relative to my comments…
The basic issue here seems to be the question of concurrency, i.e. two or more threads accessing the same object at a time. This is the scenario ConcurrentDictionary is designed for. If you use the IDictionary methods of ContainsKey() and Add() separately, then you would need explicit synchronization (but only for that operation…in this particular scenario it wouldn't strictly be needed when calling Remove()) to ensure these are performed as a single atomic operation. But the ConcurrentDictionary class anticipates this need, and includes the TryAdd() method to accomplish the same, without the explicit synchronization.
<aside>
It is not entirely clear to me the intent behind the code example as given. The code appears to be meant to only store an object in the "cache" for the duration of the invocation of the cacheMissAction delegate. The key is removed immediately after. So it does seem like it's not really caching anything per se. It just prevents more than one thread from being in the process of invoking cacheMissAction at a time (subsequent threads will fail to invoke it, but also cannot count on it having completed by the time their call to the RefreshData() method has completed).
</aside>
But taking the code example as given, it's clear that no explicit locking is actually required. The ConcurrentDictionary class already provides thread-safe access (i.e. non-corruption of the data structure when used concurrently from multiple threads), and it provides the TryAdd() method as a mechanism for adding a key (and its value, though here that's just always a bool literal of true) to the dictionary that will ensure that only one thread ever has a key in the dictionary at a time.
So we can rewrite the code to look like this instead and accomplish the same goal:
private ConcurrentDictionary<string,bool> lockMap;
public RefreshData(string key, Func<string, bool> cacheMissAction)
{
if (!lockMap.TryAdd(key, true))
{
return;
}
// if you are here means task was not already present
cacheMissAction(key);
lockMap.Remove(key);
}
No lock statement is needed for either the add or remove, as the TryAdd() handles the entire "check for key and add if not present" operation atomically.
I will note that using a dictionary to do the job of a set could be considered inefficient. If the collection is likely not to be large, it's no big deal, but I do find it odd that Microsoft chose to make the same mistake they made originally when in the pre-generics days you had to use the non-generic dictionary object Hashtable to store a set, before HashSet<T> came along. Now we have all these easy-to-use classes in System.Collections.Concurrent, but no thread-safe implementation of ISet<T> in there. Sigh…
That said, if you do prefer a somewhat more efficient approach in terms of storage (this is not necessarily a faster implementation, depending on the concurrent access patterns of the object), something like this would work as an alternative:
private HashSet<string> lockSet;
private readonly object _lock = new object();
public RefreshData(string key, Func<string, bool> cacheMissAction)
{
lock (_lock)
{
if (!lockSet.Add(key))
{
return;
}
}
// if you are here means task was not already present
cacheMissAction(key);
lock (_lock)
{
lockSet.Remove(key);
}
}
In this case, you do need the lock statement, because the HashSet<T> class is not inherently thread-safe. This is of course very similar to your original implementation, just using the more set-like semantics of HashSet<T> instead.

C# Is this method thread safe?

Consider the following code:
Dictionary<string, string> list = new Dictionary<string, string>();
object lockObj = new object();
public void MyMethod(string a) {
if (list.Contains(a))
return;
lock (lockObj) {
list.Add(a,"someothervalue");
}
}
Assuming I'm calling MyMethod("mystring") from different threads concurrently.
Would it be possible for more than one thread (we'll just take it as two) enter the if (!list.Contains(a)) statement in the same time (with a few CPU cycles differences), both threads get evaluated as false and one thread enters the critical region while another gets locked outside, so the second thread enters and add "mystring" to the list again after the first thread exits, resulting in the dictionary trying to add a duplicate key?
No, it's not thread-safe. You need the lock around the list.Contains too as it is possible for a thread to be switched out and back in again between the the if test and adding the data. Another thread may have added data in the meantime.
You need to lock the entire operation (check and add) or multiple threads may attempt to add the same value.
I would recommend using the ConcurrentDictionary(TKey, TValue) since it is designed to be thread safe.
private readonly ConcurrentDictionary<string, string> _items
= new ConcurrentDictionary<string, string>();
public void MyMethod(string item, string value)
{
_items.AddOrUpdate(item, value, (i, v) => value);
}
You need to lock around the whole statement. It's possible for you to run into issues on the .Contains portion (the way your code is now)
You should check the list after locking. e.g.
if (list.Contains(a))
return;
lock (lockObj) {
if (list.Contains(a))
return;
list.Add(a);
}
}
private Dictionary<string, string> list = new Dictionary<string, string>();
public void MyMethod(string a) {
lock (list) {
if (list.Contains(a))
return;
list.Add(a,"someothervalue");
}
}
Check out this guide to locking, it's good
A few guidelines to bear in mind
Generally lock around a private static object when locking on multiple writeable values
Do not lock on things with scope outside the class or local method such as lock(this), which could lead to deadlocks!
You may lock on the object being changed if it is the only concurrently accessed object
Ensure the object you lock is not null!
You can only lock on reference types
I am going to assume that you meant write ContainsKey instead of Contains. Contains on a Dictionary is explicitly implemented so it is not accessible via the type you declared.1
Your code is not safe. The reason is because there is nothing preventing ContainsKey and Add from executing at the same time. There are actually some quite remarkable failure scenarios that this would introduce. Because I looked at how the Dictionary is implemented I can see that your code could cause a situation where data structure contains duplicates. And I mean it literally contains duplicates. The exception will not necessarily be thrown. The other failure scenarios just keep getting stranger and stranger, but I will not go into those here.
One trivial modification to your code might involve a variation of the double-checked locking pattern.
public void MyMethod(string a)
{
if (!dictionary.ContainsKey(a))
{
lock (dictionary)
{
if (!dictionary.ContainsKey(a))
{
dictionary.Add(a, "someothervalue");
}
}
}
}
This, of course, is not any safer for the reason I already stated. Actually, the double-checked locking pattern is notoriously difficult to get right in all but the simplest cases (like the canonical implementation of a singleton). There are many variations on this theme. You can try it with TryGetValue or the default indexer, but ultimately all of these variations are just dead wrong.
So how could this be done correctly without taking a lock? You could try ConcurrentDictionary. It has the method GetOrAdd which is really useful in these scenarios. Your code would look like this.
public void MyMethod(string a)
{
// The variable 'dictionary' is a ConcurrentDictionary.
dictionary.GetOrAdd(a, "someothervalue");
}
That is all there is to it. The GetOrAdd function will check to see if the item exists. If it does not then it will be added. Otherwise, it will leave the data structure alone. This is all done in a thread-safe manner. In most cases the ConcurrentDictionary does this without waiting on a lock.2
1By the way, your variable name is obnoxious too. If it were not for Servy's comment I may have missed the fact that we were talking about a Dictionary as opposed to a List. In fact, based on the Contains call I first thought we were talking about a List.
2On the ConcurrentDictionary readers are completely lock free. However, writers always take a lock (adds and updates that is; the remove operation is still lock free). This includes the GetOrAdd function. The difference is that the data structure maintains several possible lock options so in most cases there is little or no lock contention. That is why this data structure is said to be "low lock" or "concurrent" as opposed to "lock free".
You can first do a non-locking check, but if you want to be thread-safe you need to repeat the check again within the lock. This way you don't lock unless you have to and ensure thread safety.
Dictionary<string, string> list = new Dictionary<string, string>();
object lockObj = new object();
public void MyMethod(string a) {
if (list.Contains(a))
return;
lock (lockObj) {
if (!list.Contains(a)){
list.Add(a,"someothervalue");
}
}
}

Multiple locking task (threading)

I need to implement the class that should perform locking mechanism in our framework.
We have several threads and they are numbered 0,1,2,3.... We have a static class called ResourceHandler, that should lock these threads on given objects. The requirement is that n Lock() invokes should be realeased by m Release() invokes, where n = [0..] and m = [0..]. So no matter how many locks was performed on single object, only one Release() call is enough to unlock all. Even further if o object is not locked, Release() call should perform nothing. Also we need to know what objects are locked on what threads.
I have this implementation:
public class ResourceHandler
{
private readonly Dictionary<int, List<object>> _locks = new Dictionary<int, List<object>>();
public static ResourceHandler Instance {/* Singleton */}
public virtual void Lock(int threadNumber, object obj)
{
Monitor.Enter(obj);
if (!_locks.ContainsKey(threadNumber)) {_locks.Add(new List<object>());}
_locks[threadNumber].Add(obj);
}
public virtual void Release(int threadNumber, object obj)
{
// Check whether we have threadN in _lock and skip if not
var count = _locks[threadNumber].Count(x => x == obj);
_locks[threadNumber].RemoveAll(x => x == obj);
for (int i=0; i<count; i++)
{
Monitor.Exit(obj);
}
}
// .....
}
Actually what I am worried here about is thread-safety. I'm actually not sure, is it thread-safe or not, and it's a real pain to fix that. Am I doing the task correctly and how can I ensure that this is thread-safe?
Your Lock method locks on the target objects but the _locks dictionary can be accessed by any thread at any time. You may want to add a private lock object for accessing the dictionary (in both the Lock and Release methods).
Also keep in mind that by using such a ResourceHandler it is the responsibility of the rest of the code (the consuming threads) to release all used objects (a regular lock () block for instance covers that problem since whenever you leave the lock's scope, the object is released).
You also may want to use ReferenceEquals when counting the number of times an object is locked instead of ==.
You can ensure this class is thread safe by using a ConcurrentDictionary but, it won't help you with all the problems you will get from trying to develop your own locking mechanism.
There are a number locking mechansims that are already part of the .Net Framework, you should use those.
It sounds like you are going to need to use a combination of these, including Wait Handles to achieve what you want.
EDIT
After reading more carefully, I think you might need an EventWaitHandle
What you have got conceptually looks dangerous; this is bacause calls to Monitor.Enter and Monitor.Exit for them to work as a Lock statement, are reccomended to be encapsulated in a try/finally block, that is to ensure they are executed sequetally. Calling Monitor.Exit before Monitor.Enter will throw an exception.
To avoid these problems (if an exception is thrown, the lock for a given thread may-or-may-not be taken, and if a lock is taken it will not be released, resulting in a leaked lock. I would recomend using one of the options provided in the other answers above. However, if you do want to progress with this mechanism, CLR 4.0 added the following overload to the Monitor.Enter method
public static void Enter (object, ref bool lockTaken);
lockTaken is false if and only if the Enter method throws an exception and the lock was not taken. So, using your two methods using a global bool lockTaken you can create something like (here the example is for a single locker - you will need a Dictionary of List<bool> corresponding to your threads - or event better a Tuple). So in your method Lock you would have something like
bool lockTaken = false;
Monitor.Enter(locker, ref lockTaken);
in the other method Release
if (lockTaken)
Monitor.Exit(locker);
I hope this helps.
Edit: I don't think I fully appreciate your problem, but from what I can gather I would be using a Concurrent Collection. These are fully thead safe. Check out IProducerConsumerCollection<T> and ConcurrentBag<T>. These should facilitate what you want with all thread safter taken care of by the framework (note. a thread safe collection doesn't mean the code it executes is thread safe!). However, using a collection like this, is likely to be far slower than using locks.
IMO you need to use atomic set of functions to make it safe.
http://msdn.microsoft.com/en-us/library/system.threading.mutex.aspx
Mutexes I guess will help u.

multithreading: lock on property - is this correct?

I have written the following code:
static readonly object failedTestLock = new object();
public static Dictionary<string, Exception> FailedTests
{
get
{
lock (failedTestLock)
{
return _failedTests;
}
}
set
{
lock (failedTestLock)
{
_failedTests = value;
}
}
}
public void RunTest(string testName)
{
try
{
//Run a test
}
catch (Exception exception)
{
// ?? Is this correct / threadsafe?
FailedTests.Add(testName, exception);
}
}
QUESTION:
Is this a correct manner to safely add the failed test to the Dictionary?
Is this threadsafe?
Is FailedTests.Add called INSIDE the lock or OUTSIDE the lock?
Can you explain why this is correct/threadsafe or why not?
Thanks in advance
The fundamental problem with the code above is that it only locks access to _failedTests when a thread is getting the dictionary or setting it. Only one thread can get a reference to the dictionary at a time, but once a thread has a reference to the dictionary, it can read and manipulate it without being constrained by locks.
Is this a correct manner to safely add
the failed test to the Dictionary?
No, not if two threads are trying to add to the dictionary at the same time. Nor if you expect reads and writes to happen in a particular order.
Is this threadsafe?
It depends what you mean by threadsafe, but no, not by any reasonable definition.
Is FailedTests.Add called INSIDE the
lock or OUTSIDE the lock?
The dictionary retrieval (the get accessor) happens inside a lock. This code calls Add after releasing the lock.
Can you explain why this is correct/threadsafe or why not?
If multiple threads operate on your dictionary at the same time, you can't predict the order in which those threads will change its contents and you can't control when reads will occur.
This is not thread-safe access to a dictionary, because only the property access that returns the dictionary object is thread-safe, but you are not synchronizing the call to the Add method. Consider using ConcurrentDictionary<string,Exception> in this case, or synchronize calls to Add manually.
I don;t thinks this is threadsafe, because the lock is kept only in the very brief moment where the pointer to the collection is returned. When you Add to the collection there is no lock so if two threads try to add at the same time you'll get a nasty error.
So you should lock around the FailedTest.Add code.
You may also want to look into concurrent collections, they might provide what you need.
Regards GJ
The call to Add() is outside the locks.
You can solve it by writing your own Add() method to replace the property.

Thread safe queue (list) in .net

I need to create a thread safe list of items to be added to a lucene index.
Is the following thread safe?
public sealed class IndexQueue
{
static readonly IndexQueue instance = new IndexQueue();
private List<string> items = new List<string>();
private IndexQueue() { }
public static IndexQueue Instance {
get { return instance; }
}
private object padlock = new object();
public void AddItem(string item) {
lock (padlock) {
items.Add(item);
}
}
}
Is it necessary to lock even when getting items from the internal list?
The idea is that we will then have a separate task running to grab the items from indexqueue and add them to the lucene index.
Thanks
Ben
Your implementation seems thread-safe, although you will need to lock when reading from items as well - you can not safely read if there is a concurrent Add operation. If you ever enumerate, you will need locking around that as well and that will need to live as long as the enumerator.
If you can use .net 4, I'd strongly suggest looking at the System.Collections.Concurrent namespace. It has some well tested and pretty performant collections that are thread-safe and in fact optimized around multiple-thread access.
Is it necessary to lock even when getting items from the internal list?
The List class is not thread-safe when you make modifications. It's necessary to lock if:
You use a single instance of the class from multiple threads.
The contents of the list can change while you are modifying or reading from the list.
Presumably the first is true otherwise you wouldn't be asking the question. The second is clearly true because the Add method modifies the list. So, yes, you need it.
When you add a method to your class that allows you to read back the items it is also necessary to lock, and importantly you must use the same lock object as you did in AddItem.
Yes; while retrieval is not an intrinsically unsafe operation, if you're also writing to the list, then you run the risk of retrieving in the middle of a write.
This is especially true if this will operate like a traditional queue, where a retrieval will actually remove the retrieved value from the list.

Categories

Resources