Say you have an in-memory list of strings, and a multi-threaded system, with many readers but just one writer thread.
In general, is it possible to implement this kind of system in C#, without using a lock? Would the implementation make any assumptions about how the threads interact (or place restrictions on what they can do, when)?
Yes. The trick is to make sure the list remains immutable. The writer will snapshot the main collection, modify the snapshot, and then publish the snapshot to the variable holding the reference to the main collection. The following example demonstrates this.
public class Example
{
// This is the immutable master collection.
volatile List<string> collection = new List<string>();
void Writer()
{
var copy = new List<string>(collection); // Snapshot the collection.
copy.Add("hello world"); // Modify the snapshot.
collection = copy; // Publish the snapshot.
}
void Reader()
{
List<string> local = collection; // Acquire a local reference for safe reading.
if (local.Count > 0)
{
DoSomething(local[0]);
}
}
}
There are a couple of caveats with this approach.
It only works because there is a single writer.
Writes are O(n) operations.
Different readers may be using different version of the list simultaneously.
This is a fairly dangerous trick. There are very specific reasons why volatile was used, why a local reference is acquired on the reader side, etc. If you do not understand these reasons then do not use the pattern. There is too much that can go wrong.
The notion that this is thread-safe is semantic. No, it will not throw exceptions, blow up, or tear a whole in spacetime. But, there are other ways in which this pattern can cause problems. Know what the limitations are. This is not a miracle cure for every situation.
Because of the above constraints the scenarios where this would benefit you are quite limited. The biggest problem is that writes require a full copy first so they may be slow. But, if the writes are infrequent then this might be tolerable.
I describe more patterns in my answer here as well including one that is safe for multiple writers.
That is a fairly common request for a threading library to fulfill - that sort of lock is generally just called a "reader-writer lock", or some variation on that theme. I haven't ever needed to use the C# implementation specifically, but there is one: http://msdn.microsoft.com/en-us/library/system.threading.readerwriterlockslim.aspx
Of course, you run into the issue that if readers will always be reading, you'll never be able to get the writer in to write. You'll have to handle that yourself, I believe.
(Ok, so it's still technically a "lock", but it's not the C# "lock" construct, it's a more sophisticated object specifically designed for the purpose stated in the question. So I guess whether it's a correct answer depends somewhat on semantics and on why he was asking the question.)
To avoid locks, you might want to consider Microsoft's concurrent collections. These collections provide thread safe access to collections of objects in both ordered and unordered forms. They use some neat tricks to avoid locking internally in as many instances as possible.
You can also use Microsoft's new Immutable Collections library: http://blogs.msdn.com/b/bclteam/archive/2012/12/18/preview-of-immutable-collections-released-on-nuget.aspx
Note: this is completely separate from the Concurrent Collections.
A singly-linked-list approach can be used without locks provided the writer only inserts/deletes at either the head or the tail. In either case, if you construct the new node beforehand, you only need a single atomic operation (head = newHead; or tail.next = newTail) to make the operation visible to the readers.
In terms of performance, insertions and deletions are O(1), while length calculation is O(n).
Related
I am in need of a data type that is able to insert entries and then be able to quickly determine if an entry has already been inserted. A Dictionary seems to suit this need (see example). However, I have no use for the dictionary's values. Should I still use a dictionary or is there another better suited data type?
public class Foo
{
private Dictionary<string, bool> Entities;
...
public void AddEntity(string bar)
{
if (!Entities.ContainsKey(bar))
{
// bool value true here has no use and is just a placeholder
Entities.Add(bar, true);
}
}
public string[] GetEntities()
{
return Entities.Keys.ToArray();
}
}
You can use HashSet<T>.
The HashSet<T> class provides high-performance set operations. A set
is a collection that contains no duplicate elements, and whose
elements are in no particular order.
Habib's answer is excellent, but for multi-threaded environments if you use a HashSet<T> then by consequence you have to use locks to protect access to it. I find myself more prone to creating deadlocks with lock statements. Also, locks yield a worse speedup per Amdahl's law because adding a lock statement reduces the percentage of your code that is actually parallel.
For those reasons, a ConcurrentDictionary<T,object> fits the bill in multi-threaded environments. If you end up using one, then wrap it like you did in your question. Just new up objects to toss in as values as needed, since the values won't be important. You can verify that there are no lock statements in its source code.
If you didn't need mutability of the collection then this would be moot. But your question implies that you do need it, since you have an AddEntity method.
Additional info 2017-05-19 - actually, ConcurrentDictionary does use locks internally, although not lock statements per se--it uses Monitor.Enter (check out the TryAddInternal method). However, it seems to lock on individual buckets within the dictionary, which means there will be less contention than putting the entire thing in a lock statement.
So all in all, ConcurrentDictionary is often better for multithreaded environments.
It's actually quite difficult (impossible?) to make a concurrent hash set using only the Interlocked methods. I tried on my own and kept running into the problem of needing to alter two things at the same time--something that only locking can do in general. One workaround I found was to use singly-linked lists for the hash buckets and intentionally create cycles in a list when one thread needed to operate on a node without interference from other threads; this would cause other threads to get caught spinning around in the same spot until that thread was done with its node and undid the cycle. Sure, it technically didn't use locks, but it did not scale well.
I have an array, that represents an inventory, with nearly 50 elements (items: some costum objects) and I need a readerwritelock for it (okay, i think a simple lock would be enough too). It should support both reference changing and value changing.
As reading and writing to different position of the array is threadsafe (Proof) I want to ensure that multiple read/write operations on the same array position is also threadsafe.
I surely could create 50 readerwriterlocks, but I don't want that ;)
Is there a way to archive this? (I know ConcurrentList/Dictionary/etc. but I want an array...)
Thanks
If you are replacing the references in the array, then this is already safe, since reference swaps are inherently atomic. So you can use:
var val = arr[index];
// now work with val
and
var newVal = ...
arr[index] = newVal;
perfectly safely, at least in terms of avoiding torn references. So one pragmatic option is to make the object immutable, and just employ the above. If you need to change the value, take a local copy, make a new version based from that, and then swap them. If lost updates are a problem, then Interlocked.CompareExchange and a re-apply loop can be used very successfully (i.e. you keep reapplying your change until you "win"). This avoids the need for any locking.
If, however, you are mutating the individual objects, then the game changes. You could make the object internally thread-safe, but this is usually not pretty. You could have a single lock for all the objects. But if you want granular locking then you will need multiple locks.
My advice: make the object immutable and just use the atomic reference-swap trick.
First off, you may not need any locks. Reading and writing with an array of a type where the CPU would handle each read and write atomically, is in and of itself thread-safe (but you might want to put in a memory barrier to avoid stale reads).
That said, just like x = 34 for an integer is threadsafe but x++ is not, if you've writes that depend upon the current value (and which are hence a read and a write), then that is not threadsafe.
If you do need locks, but don't want as many as 50, you could stripe. First set up your striped locks (I'll use simple locks rather than ReaderWriterSlim for smaller example code, the same principle applies):
var lockArray = new object[8];
for(var i =0; i != lockArray.Length; ++i)
lockArray[i] = new object();
Then when you go to use it:
lock(lockArray[idx % 8])
{
//operate on item idx of your array here
}
It's a balance between the simplicity and size of one lock for everything, vs the memory use of one lock for each element.
The big difficulty comes in if an operation on one element depends on that of another, if you need to resize the array, or any other case where you need to have more than one lock. A lot of deadlock situations can be avoided by always acquiring the locks in the same order (so no other thread needing more than one lock will try to get one you already have while holding one you need), but you need to be very careful in these cases.
You also want to make sure that if you are dealing with say, index 3 and index 11, you avoid locking on object 3 twice (I can't think of a way this particular recursive locking would go wrong, but why not just avoid it rather than have to prove it's one of the cases where recursive locking is safe?)
I m using ConcurrentBag to store object in run time. At some point I need to empty the bag and store the bag content to a list. This is what i do:
IList<T> list = new List<T>();
lock (bag)
{
T pixel;
while (bag.TryTake(out pixel))
{
list.Add(pixel);
}
}
My Question is with synchronization, As far as I read in the book lock is faster than others synchronization methods. Source -- http://www.albahari.com/threading/part2.aspx.
Performance is my second concern, I d like to know if I can use ReaderWriterLockSlim at this point. What would be the benefit of using ReaderWriterLockSlim? The reason is that, I dont want this operation to block incoming requests.
If yes, Should I use Upgradable Lock?
Any ideas ? Comments?
I'm not sure why you're using the lock. The whole idea behind ConcurrentBag is that it's concurrent.
Unless you're just trying to prevent some other thread from taking things or adding things to the bag while you're emptying it.
Re-reading your question, I'm pretty sure you don't want to synchronize access here at all. ConcurrentBag allows multiple threads to Take and Add, without you having to do any explicit synchronization.
If you lock the bag, then no other thread can add or remove things while your code is running. Assuming, of course, that you protect every other access to the bag with a lock. And once you do that, you've completely defeated the purpose of having a lock-free concurrent data structure. Your data structure has become a poorly-performing list that's controlled by a lock.
Same thing if you use a reader-writer lock. You'd have to synchronize every access.
You don't need to add any explicit synchronization in this case. Ditch the lock.
Lock is great when threads will do a lot of operations in a row(bursty - low contention)
RWSlim is great when you have a lot more read locks than write locks(read heavy - high read contention)
Lockless is great when you need a multiple readers and/or writers all working at the same time(mix of read/write - lots of contention)
I would like to have thread-safe read and write access to an auto-implemented property. I am missing this functionality from the C#/.NET framework, even in it's latest version.
At best, I would expect something like
[Threadsafe]
public int? MyProperty { get; set; }
I am aware that there are various code examples to achieve this, but I just wanted to be sure that this is still not possible using .NET framework methods only, before implementing something myself. Am I wrong?
EDIT: As some answers elaborate on atomicity, I want to state that I just want to have that, as far as I understand it: As long as (and not longer than) one thread is reading the value of the property, no other thread is allowed to change the value. So, multi-threading would not introduce invalid values. I chose the int? type because that is the on I am currently concerned about.
EDIT2: I have found the specific answer to the example with Nullable here, by Eric Lippert
Correct; there is no such device. Presumably you are trying to protect against reading the field while another thread has changed half of it (atomicity)? Note that many (small) primitives are inherently safe from this type of threading issue:
5.5 Atomicity of variable references
Reads and writes of the following data
types are atomic: bool, char, byte,
sbyte, short, ushort, uint, int,
float, and reference types. In
addition, reads and writes of enum
types with an underlying type in the
previous list are also atomic.
But in all honesty this is just the tip of the threading ice-berg; by itself it usually isn't enough to just have a thread-safe property; most times the scope of a synchronized block must be more than just one read/write.
There are also so many different ways of making something thread-safe, depending on the access profile;
lock ?
ReaderWriterLockSlim ?
reference-swapping to some class (essentially a Box<T>, so a Box<int?> in this case)
Interlocked (in all the guises)
volatile (in some scenarios; it isn't a magic wand...)
etc
(not to mention making it immutable (either through code, or by just choosing not to mutate it, which is often the simplest way of making it thread-safe)
I'm answering here to add to Marc's answer, where he says "there are also so many different ways of making something thread-safe, depending on the access profile".
I just want to add, that part of the reason for this, is that there are so many ways of not being thread-safe, that when we say something is thread-safe, we have to be clear on just what safety is provided.
With almost any mutable object, there will be ways to deal with it that are not thread-safe (note almost any, an exception is coming up). Consider a thread-safe queue that has the following (thread-safe) members; an enqueue operation, a dequeue operation and a count property. It's relatively easy to construct one of these either through locking internally on each member, or even with lock-free techniques.
However, say we used the object like so:
if(queue.Count != 0)
return queue.Dequeue();
The above code is not thread-safe, because there is no guarantee that after the (thread-safe) Count returning 1, another thread won't dequeue and hence cause the second operation to fail.
It is still a thread-safe object in many ways, particularly as even in this case of failure, the failing dequeue operation will not put the object into an invalid state.
To make an object as a whole thread-safe in the face of any given combination of operations we have to either make it logically immutable (it's possible to have internal mutability with thread-safe operations updating internal state as an optimisation - e.g. through memoisation or loading from a datasource as needed, but to the outside it must appear immutable) or to severely reduce the number of external operations possible (we could create a thread-safe queue that only had Enqueue and TryDequeue which is always thread-safe but that both reduces the operations possible, and also forces a failed dequeue to be redefined as not being a failure, and forces a change in logic on calling code from the version we had earlier).
Anything else is a partial guarantee. We get some partial guarantees for free (as Marc notes, acting on some automatic properties are already thread-safe in regards to being individually atomic - which in some cases is all the thread safety we need, but in other cases doesn't go anywhere near far enough).
Let's consider an attribute that adds this partial guarantee to those cases where we don't already get it. Just how much value is it to us? Well, in some cases it will be perfect, but in others it won't. Going back to our case of testing before dequeue, having such a guarantee on Count isn't much use - we had that guarantee and the code still failed in multi-threaded conditions in a way it wouldn't in single-threaded conditions.
What's more, adding this guarantee to the cases that don't already have it requires at least a degree of overhead. It may be premature optimisation to worry about overhead all the time, but adding overhead for no gain is premature pessimisation, so lets not do that! What's more, if we do provide the wider concurrency control to make a set of operations truly thread-safe, then we will have rendered the narrower concurrency controls irrelevant, and they become pure overhead - so we don't even get value out of our overhead in some cases; it's almost always purely waste.
It's also not clear how wide or narrow the concurrency concerns are. Do we need to lock (or similar) only on that property, or do we need to lock on all properties? Do we need to lock also on non-automatic operations, and is that even possible?
There is no good single answer here (they can be tricky questions to answer in rolling your own solution, never mind in trying to answer it in the code that would produce such code when someone else has used this [Threadsafe] attribute).
Also, any given approach will have a different set of conditions in which deadlock, livelock, and similar problems can occur, so we can actually reduce thread-safety by treating thread-safety as something we can just blindly apply to a property.
Without being able to find a single universal answer to those questions, there is no good way of providing a single universal implementation, and any such [Threadsafe] attribute would be of very limited value at best. Finally, at the psychological level of the programmer using it, it is very likely to lead to a false sense of security that they have created a thread-safe class when in fact they have not; which would make it actually worse than useless.
No, not possible. No free lunch here. The moment your autoproperties need even a tip more (thread safety, INotifyPropertyChanged) it is down to do yourself manually - no automatic properties magic.
According to the C# 4.0 spec this behavior is unchanged:
Section 10.7.3 Automatically implemented properties
When a property is specified as an automatically implemented property, a hidden backing field is automatically available for the property, and the accessors are implemented to read from and write to that backing field.
The following example:
public class Point {
public int X { get; set; } // automatically implemented
public int Y { get; set; } // automatically implemented
}
is equivalent to the following declaration:
public class Point {
private int x;
private int y;
public int X { get { return x; } set { x = value; } }
public int Y { get { return y; } set { y = value; } }
}
I have two threads, a producer thread that places objects into a generic List collection and a consumer thread that pulls those objects out of the same generic List. I've got the reads and writes to the collection properly synchronized using the lock keyword, and everything is working fine.
What I want to know is if it is ok to access the Count property without first locking the collection.
JaredPar refers to the Count property in his blog as a decision procedure that can lead to race conditions, like this:
if (list.Count > 0)
{
return list[0];
}
If the list has one item and that item is removed after the Count property is accessed but before the indexer, an exception will occur. I get that.
But would it be ok to use the Count property to, say, determine the initial size a completely different collection? The MSDN documentation says that instance members are not guaranteed to be thread safe, so should I just lock the collection before accessing the Count property?
I suspect it's "safe" in terms of "it's not going to cause anything to go catastrophically wrong" - but that you may get stale data. That's because I suspect it's just held in a simple variable, and that that's likely to be the case in the future. That's not the same as a guarantee though.
Personally I'd keep it simple: if you're accessing shared mutable data, only do so in a lock (using the same lock for the same data). Lock-free programming is all very well if you've got appropriate isolation in place (so you know you've got appropriate memory barriers, and you know that you'll never be modifying it in one thread while you're reading from it in another) but it sounds like that isn't the case here.
The good news is that acquiring an uncontested lock is incredibly cheap - so I'd go for the safe route if I were you. Threading is hard enough without introducing race conditions which are likely to give no significant performance benefit but at the cost of rare and unreproducible bugs.