I need to create a thread safe list of items to be added to a lucene index.
Is the following thread safe?
public sealed class IndexQueue
{
static readonly IndexQueue instance = new IndexQueue();
private List<string> items = new List<string>();
private IndexQueue() { }
public static IndexQueue Instance {
get { return instance; }
}
private object padlock = new object();
public void AddItem(string item) {
lock (padlock) {
items.Add(item);
}
}
}
Is it necessary to lock even when getting items from the internal list?
The idea is that we will then have a separate task running to grab the items from indexqueue and add them to the lucene index.
Thanks
Ben
Your implementation seems thread-safe, although you will need to lock when reading from items as well - you can not safely read if there is a concurrent Add operation. If you ever enumerate, you will need locking around that as well and that will need to live as long as the enumerator.
If you can use .net 4, I'd strongly suggest looking at the System.Collections.Concurrent namespace. It has some well tested and pretty performant collections that are thread-safe and in fact optimized around multiple-thread access.
Is it necessary to lock even when getting items from the internal list?
The List class is not thread-safe when you make modifications. It's necessary to lock if:
You use a single instance of the class from multiple threads.
The contents of the list can change while you are modifying or reading from the list.
Presumably the first is true otherwise you wouldn't be asking the question. The second is clearly true because the Add method modifies the list. So, yes, you need it.
When you add a method to your class that allows you to read back the items it is also necessary to lock, and importantly you must use the same lock object as you did in AddItem.
Yes; while retrieval is not an intrinsically unsafe operation, if you're also writing to the list, then you run the risk of retrieving in the middle of a write.
This is especially true if this will operate like a traditional queue, where a retrieval will actually remove the retrieved value from the list.
Related
I have a singleton below. I have multiple threads using the lookup to check if values are valid. It's been awhile since I've done anything with shared memory, so I want to make sure what locks are necessary. I'm unsure if I need a concurrent set instead of HashSet since I'm only inserting values once.
I have [MethodImpl(MethodImplOptions.Synchronized)] on the Instance property because I read that properties aren't sycrhonized (makes sense). This should prevent multiple instances being created, although I'm not sure if I should really worry about that (just extra cost of reloading the set?).
Should I make the FipsIsValid function Syncrhonized, or use some sort of concurrent set? Or are neither necessary?
public class FipsLookup
{
private static FipsLookup instance;
private HashSet<string> fips;
private FipsLookup()
{
using (HarMoneyDB db = new HarMoneyDB())
{
instance.fips = new HashSet<string>(db.Counties.Select(c => c.FIPS).ToArray());
}
}
[MethodImpl(MethodImplOptions.Synchronized)]
public static FipsLookup Instance
{
get
{
if (instance == null)
{
instance = new FipsLookup();
}
return instance;
}
}
public static bool FipsIsValid(string fips)
{
var instance = FipsLookup.Instance;
return instance.fips.Contains(fips);
}
}
Should I make the FipsIsValid function Syncrhonized, or use some sort
of concurrent set? Or are neither necessary?
I think the key to this answer is the fact that you are only performing lookups on the HashSet, and not mutating it. Since it is initialized once, and only once, there is no need to synchronize the lookup.
If you do decide that along the way that you do need to mutate it, then using a proper lock or a concurrent collection would be needed.
On a side note, you can simplify your singleton by initializing the instance field once inside a static constructor:
private static FipsLookup instance;
static FipsLookup()
{
instance = new FipsLookup();
}
And now you can make Instance return the field, with no need to use [MethodImpl(MethodImplOptions.Synchronized)]:
public static FipsLookup Instance
{
get
{
return instance;
}
}
This is safe because Instance is synchronized which is equivalent to a lock. All writes happen under that lock. Releasing the lock flushes all writes (a release barrier).
Also, all read first go through the lock. It is not possible to observe a partially written hashset. A previous version of this answer made the following incorrect claims:
This is not strictly safe (under ECMA) because readers might see a half-written HashSet. In practice it is safe (on the Microsoft CLR because all stores are releases) but I wouldn't use it because there is no reason to.
When writing this I did not notice the MethodImplOptions.Synchronized. So for your entertainment this is what happens when you forget a lock.
Probably, you should be using Lazy<T> which handles this for you and it gives you lock-free reads.
MethodImplOptions.Synchronized on static members is a little evil because it locks on the type object of the class. Let's hope nobody else is locking on this (shared) object. I would fail this in a code review, mostly because there is no reason to introduce this code smell.
HashSet class is not thread safe and there is no garantee that you can access it from multiple threads and all will be ok. I'd prefer to use ConcurrentDictionary instead.
There are several places in my code (C#) which will modify a queue or a list. I want to make sure at any time there is only one thread modify the queue or list. so How to keep its consistency in C#? thanks
Edit
Please see sample code COMException on Main Thread of WPF application
Hold a mutex while accessing or modifying the queue or list.
Use the lock statement.
lock(myObject) {
//Code
}
If you are on .NET 3 or later, the simplest solution is to use SynchronizedCollection.
There are several threadsafe types available in the System.Collections.Concurrent namespace and elsewhere in the framework. Using these may be the simplest option.
You could also perform your own explicit locking using the lock statement. E.g.
public class Foo
{
private readonly List<int> list = new List<int>();
private readonly object locker = new object();
public void SomeCommand(Bar bar)
{
lock (this.locker)
{
// do something here?
this.list.Add(..);
// do something here?
}
}
public Baz SomeQuery()
{
lock (this.locker)
{
// do something here?
this.list.Select(...);
// do something here?
}
}
public void SomethingElseEntirely()
{
lock (this.locker)
{
// do something here
}
}
}
The best option varies from case to case. E.g. do you want to also synchronize operations other than just the collection manipulations (as indicated in the code sample), do you want to performance tune for more reads than writes or more writes than reads (some of the built in types are already tuned in these ways), etc.
You could also experiment with other explicit thread synchronization types, such as ReaderWriterLockSlim which is optimised for more reads than writes, and see what fits your scenario best.
It's best to use built-in thread-safe collections such as ConcurrentQueue.
Suppose that I have a Dictionary<string, string>. The dictionary is declared as public static in my console program.
If I'm working with threads and I want to do foreach on this Dictionary from one thread but at the same time another thread want to add item to the dictionary. This would cause a bug here because we can't modify our Dictionary while we are running on it with a foreach loop in another thread.
To bypass this problem I created a lock statement on the same static object on each operation on the dictionary.
Is this the best way to bypass this problem? My Dictionary can be very big and I can have many threads that want to foreach on it. As it is currently, things can be very slow.
Try using a ConcurrentDictionary<TKey, TValue>, which is designed for this kind of scenario.
There's a nice tutorial here on how to use it.
The big question is: Do you need the foreach to be a snapshot?
If the answer is "no", then use a ConcurrentDictionary and you will probably be fine. (The one remaining question is whether the nature of your inserts and reads hit the striped locks in a bad way, but if that was the case you'd be finding normal reads and writes to the dictionary even worse).
However, because it's GetEnumerator doesn't provide a snapshot, it will not be enumerating the same start at the beginning as it is at the end. It could miss items, or duplicate items. The question is whether that's a disaster to you or not.
If it would be a disaster if you had duplicates, but not otherwise, then you can filter out duplicates with Distinct() (whether keyed on the keys or both the key and value, as required).
If you really need it to be a hard snapshot, then take the following approach.
Have a ConcurrentDictionary (dict) and a ReaderWriterLockSlim (rwls). On both reads and writes obtain a reader lock (yes even though you're writing):
public static void AddToDict(string key, string value)
{
rwls.EnterReadLock();
try
{
dict[key] = value;
}
finally
{
rwls.ExitReadLock();
}
}
public static bool ReadFromDict(string key, out string value)
{
rwls.EnterReadLock();
try
{
return dict.TryGetValue(key, out value);
}
finally
{
rwls.ExitReadLock();
}
}
Now, when we want to enumerate the dictionary, we acquire the write lock (even though we're reading):
public IEnumerable<KeyValuePair<string, string>> EnumerateDict()
{
rwls.EnterWriteLock();
try
{
return dict.ToList();
}
finally
{
rwls.ExitWriteLock();
}
}
This way we obtain the shared lock for reading and writing, because ConcurrentDictionary deals with the conflicts involved in that for us. We obtain the exclusive lock for enumerating, but just for long enough to obtain a snapshot of the dictionary in a list, which is then used only in that thread and not shared with any other.
With .NET 4 you get a fancy new ConcurrentDictionary. I think there are some .NET 3.5-based implementations floating around.
Yes, you will have a problem updating the global dictionary while an enumeration is running in another thread.
Solutions:
Require all users of the dictionary to acquire a mutex lock before accessing the object, and release the lock afterwards.
Use .NET 4.0's ConcurrentDictionary class.
Imagine that in the class below, one thread gets the IEnumerable-object and starts iteration over the elements. While in the middle of the iteration, another thread comes along and add a new entry to the library_entries via the Add-method. Would a "Collection was modified"-exception be thrown in the iteration? Or would the lock prevent the adding of the element until the iteration is complete? Or neither?
Thanks!
public static class Library
{
private static List<string> library_entries = new List<string>(1000000);
public static void Add(string entry)
{
lock (library_entries)
library_entries.Add(entry);
}
public static IEnumerable<string> GetEntries()
{
return library_entries.Where(entry => !string.IsNullOrEmpty(entry));
}
}
No, you won't get the exception, you are using a Linq query. It is much worse, it will fail unpredictably. The most typical outcome is that the same item gets enumerated twice, although anything is possible when the List re-allocates its internal storage during the Add() call, including an IndexOutOfRangeException. Once a week, give or take.
The code that calls GetEntries() and consumes the enumerator must acquire the lock as well. Locking the Where() expression is not good enough. Unless you create a copy of the list.
locking wouldn't help at all, since the iteration doesn't use locking. I recommend rewriting the GetEntries() function to return a copy.
public static IEnumerable<string> GetEntries()
{
lock(lockObj)
{
return library_entries.Where(entry => !string.IsNullOrEmpty(entry)).ToList();
}
}
Note this returns a consistent snapshots. i.e. it won't return newly added objects when you're iterating.
And I prefer to lock on a private object whose only purpose is locking, but since the list is private it's no real problem, just a stylistic issue.
You could also write your own iterator like this:
int i=0;
bool MoveNext()
{
lock(lockObj)
{
if(i<list.Count)
return list[i];
i++;
}
}
If that's a good idea depends on your access patterns, lock contention, size of the list,... You might also want to use a Read-Write lock to avoid contention from many read accesses.
The static GetEntries method doesn't perform any lock on the static library_entries collection => it is not thread-safe and any concurrent calls on it from multiple threads might break. The fact that you have locked the Add method is fine but enumerating is not a thread safe operation so you must lock on it as well if you intend to call the GetEntries method concurrently. Also because this method returns an IEnumerable<T> it won't do anything on the actual list until you start enumerating which could be outside of the GetEntries method. So you could add a .ToList() call at the end of the LINQ chaining and thenlock the entire operation.
Yes, an exception would be thrown - you aren't locking on a common object. Also, a lock in the GetEntries method would be useless, as that call returns instantly. The locking would've to occur while iterating.
I have a situation in C# where I have a list of simple types. This list can be accessed by multiple threads: entries can be added or removed, and the existence of an entry can be checked. I have encapsulated the list in an object exposing just those three operations so far.
I have a few cases to handle (not exactly the same as the methods I just mentioned).
A thread can just check for the existence of an entry. (simple)
A thread can check for the existence of an entry, and if it doesn't exist, add it.
A thread needs to check whether an entry exists, and if it does, wait until it is removed.
A combination of 2 and 3, where a thread checks for the existence of an entry, if it does exist, it must wait until it is removed before it can then add it itself.
The whole idea is that the existence of an entry signifies a lock. If an entry exists, the object it identifies cannot be changed and code cannot proceed because it is being modified elsewhere.
These may seem like simple novice situations but I'm refreshing myself on concurrency issues and it's making me a bit paranoid, and I'm also not as familiar with C#'s concurrency mechanisms.
What would be the best way to handle this? Am I totally off? Should check and add (test and set?) be combined into a fourth atomic operation? Would I simply be adding lock blocks to my methods where the list is accessed?
Also, is it possible to unit test this kind of thing (not the simple operations, the concurrency situations)?
Unit testing will certainly be hard.
This can all be done reasonably simply with the "native" concurrency mechanisms in .NET: lock statements and Monitor.Wait/Monitor.PulseAll. Unless you have a separate monitor per item though, you're going to need to wake all the threads up whenever anything is removed - otherwise you won't be able to tell the "right" thread to wake up.
If it really is just a set of items, you might want to use HashSet<T> instead of List<T> to represent the collection, by the way - nothing you've mentioned is to do with ordering.
Sample code, assuming that a set is okay for you:
using System;
using System.Collections.Generic;
using System.Threading;
public class LockCollection<T>
{
private readonly HashSet<T> items = new HashSet<T>();
private readonly object padlock = new object();
public bool Contains(T item)
{
lock (padlock)
{
return items.Contains(item);
}
}
public bool Add(T item)
{
lock (padlock)
{
// HashSet<T>.Add does what you want already :)
// Note that it will return true if the item
// *was* added (i.e. !Contains(item))
return items.Add(item);
}
}
public void WaitForNonExistence(T item)
{
lock (padlock)
{
while (items.Contains(item))
{
Monitor.Wait(padlock);
}
}
}
public void WaitForAndAdd(T item)
{
lock (padlock)
{
WaitForNonExistence(item);
items.Add(item);
}
}
public void Remove(T item)
{
lock (padlock)
{
if (items.Remove(item))
{
Monitor.PulseAll(padlock);
}
}
}
}
(Completely untested, admittedly. You might also want to specify timeouts for the waiting code...)
While #1 may be the simplest to write, it's essentially a useless method. Unless you are holding onto the same lock after finishing a query for "existence of an entry", you are actually returning "existence of an entry at some point in the past". It doesn't give you any information about the current existence of the entry.
In between the discovery of a value in the list then doing any operation to retrieve, remove the value, another thread could come and remove it for you.
Contains operations on a concurrent list should be combined with the operation you plan on doing in the case of true/false existence of that check. For instance TestAdd() or TestRemove() is much safer than Contains + Add or Contains + Remove
here is a proper, concurrent, thread-safe, parallelisable concurrent list implementation
http://www.deanchalk.me.uk/post/Task-Parallel-Concurrent-List-Implementation.aspx
There is a product for finding race conditions and suchlike in unit tests. It's called TypeMock Racer. I can't say anything for or against its effectiveness, though. :)