Think of a network of nodes (update: 'network of nodes' meaning objects in the same application domain, not a network of independent applications) passing objects to each other (and doing some processing on them). Is there a pattern in C# for restricting the access to an object to only the node that is actually processing it?
Main motivation: Ensuring thread-safety (no concurrent access) and object consistency (regarding the data stored in it).
V1: I thought of something like this:
class TransferredObject
{
public class AuthLock
{
public bool AllowOwnerChange { get; private set; }
public void Unlock() { AllowOwnerChange = true; }
}
private AuthLock currentOwner;
public AuthLock Own()
{
if (currentOwner != null && !currentOwner.AllowOwnerChange)
throw new Exception("Cannot change owner, current lock is not released.");
return currentOwner = new AuthLock();
}
public void DoSomething(AuthLock authentification)
{
if (currentOwner != authentification)
throw new Exception("Don't you dare!");
// be sure, that this is only executed by the one holding the lock
// Do something...
}
}
class ProcessingNode
{
public void UseTheObject(TransferredObject x)
{
// take ownership
var auth = x.Own();
// do processing
x.DoSomething(auth);
// release ownership
auth.Unlock();
}
}
V2: Pretty much overhead - a less 'strict' implementation would perhaps be to ignore the checking and rely on the "lock/unlock" logic:
class TransferredObject
{
private bool isLocked;
public Lock()
{
if(isLocked)
throw new Exception("Cannot lock, object is already locked.");
isLocked = true;
}
public Unlock() { isLocked = false; }
public void DoSomething()
{
if (isLocked)
throw new Exception("Don't you dare!");
// Do something...
}
}
class ProcessingNode
{
public void UseTheObject(TransferredObject x)
{
// take ownership
x.Lock = true;
// do processing
x.DoSomething();
// release ownership
x.Unlock = true;
}
}
However: This looks a bit unintuitive (and having to pass an the auth instance with ervery call is ugly). Is there a better approach? Or is this a problem 'made by design'?
To clarify your question: you seek to implement the rental threading model in C#. A brief explanation of different ways to handle concurrent access to an object would likely be helpful.
Single-threaded: all accesses to the object must happen on the main thread.
Free-threaded: any access to the object may happen on any thread; the developer of the object is responsible for ensuring the internal consistency of the object. The developer of the code consuming the object is responsible for ensuring that "external consistency" is maintained. (For example, a free-threaded dictionary must maintain its internal state consistently when adds and removes happen on multiple threads. An external caller must recognize that the answer to the question "do you contain this key?" might change due to an edit from another thread.)
Apartment threaded: all accesses to a given instance of an object must happen on the thread that created the object, but different instances can be affinitized to different threads. The developer of the object must ensure that internal state which is shared between objects is safe for multithreaded access but state which is associated with a given instance will only ever be read or written from a single thread. Typically UI controls are apartment threaded and must be in the apartment of the UI thread.
Rental threaded: access to a given instance of an object must happen from only a single thread at any one time, but which thread that is may change over time
So now let's consider some questions that you should be asking:
Is the rental model a reasonable way to simplify my life, as the author of an object?
Possibly.
The point of the rental model is to achieve some of the benefits of multithreading without taking on the cost of implementing and testing a free-threaded model. Whether those increased benefits and lowered costs are a good fit, I don't know. I personally am skeptical of the value of shared memory in multithreaded situations; I think the whole thing is a bad idea. But if you're bought into the crazy idea that multiple threads of control in one program modifying shared memory is goodness, then maybe the rental model is for you.
The code you are writing is essentially an aid to the caller of your object to make it easier for the caller to obey the rules of the rental model and easier to debug the problem when they stray. By providing that aid to them you lower their costs, at some moderate increase to your own costs.
The idea of implementing such an aid is a good one. The original implementations of VBScript and JScript at Microsoft back in the 1990s used a variation on the apartment model, whereby a script engine would transit from a free-threaded mode into an apartment-threaded mode. We wrote a lot of code to detect callers that were violating the rules of our model and produce errors immediately, rather than allowing the violation to produce undefined behaviour at some unspecified point in the future.
Is my code correct?
No. It's not threadsafe! The code that enforces the rental model and detects violations of it cannot itself assume that the caller is correctly using the rental model! You need to introduce memory barriers to ensure that the various threads reading and writing your lock bools are not moving those reads and writes around in time. Your Own method is chock full of race conditions. This code needs to be very, very carefully designed and reviewed by an expert.
My recommendation - assuming again that you wish to pursue a shared memory multithreaded solution at all - is to eliminate the redundant bool; if the object is unowned then the owner should be null. I don't usually advocate a low-lock solution, but in this case you might consider looking at Interlocked.CompareExchange to do an atomic compare-and-swap on the field with a new owner. If the compare to null fails then the user of your API has a race condition which violates the rental model. This introduces a memory barrier.
Maybe your example is too simplified and you really need this complex ownership thing, but the following code should do the job:
class TransferredObject
{
private object _lockObject = new object();
public void DoSomething()
{
lock(_lockObject)
{
// TODO: your code here...
}
}
}
Your TransferredObject has an atomic method DoSomething that changes some state(s) and should not run multiple times at the same time. So, just put a lock into it to synchronize the critical section.
See http://msdn.microsoft.com/en-us/library/c5kehkcz%28v=vs.90%29.aspx
Related
Experts on threading/concurrency/memory model in .NET, could you verify that the following code is correct under all circumstances (that is, regardless of OS, .NET runtime, CPU architecture, etc.)?
class SomeClassWhoseInstancesAreAccessedConcurrently
{
private Strategy _strategy;
public SomeClassWhoseInstancesAreAccessedConcurrently()
{
_strategy = new SomeStrategy();
}
public void DoSomething()
{
Volatile.Read(ref _strategy).DoSomething();
}
public void ChangeStrategy()
{
Interlocked.Exchange(ref _strategy, new AnotherStrategy());
}
}
This pattern comes up pretty frequently. We have an object which is used concurrently by multiple threads and at some point the value of one of its fields needs to be changed. We want to guarantee that from that point on every access to that field coming from any thread observe the new value.
Considering the example above, we want to make sure that after the point in time when ChangeStrategy is executed, it can't happen that SomeStrategy.DoSomething is called instead of AnotherStrategy.DoSomething because some of the threads don't observe the change and use the old value cached in a register/CPU cache/whatever.
To my knowledge of the topic, we need at least volatile read to prevent such caching. The main question is that is it enough or we need Interlocked.CompareExchange(ref _strategy, null, null) instead to achieve the correct behavior?
If volatile read is enough, a further question arises: do we need Interlocked.Exchange at all or even volatile write would be ok in this case?
As I understand, volatile reads/writes use half-fences which allows a write followed by a read reordered, whose implications I still can't fully understand, to be honest. However, as per ECMA 335 specification, section I.12.6.5, "The class library provides a variety of atomic operations in the
System.Threading.Interlocked class. These operations (e.g., Increment, Decrement, Exchange,
and CompareExchange) perform implicit acquire/release operations." So, if I understand this correctly, Interlocked.Exchange should create a full-fence, which looks enough.
But, to complicate things further, it seems that not all Interlocked operations were implemented according to the specification on every platform.
I'd be very grateful if someone could clear this up.
Yes, your code is safe. It is functionally equivalent with using a lock like this:
public void DoSomething()
{
Strategy strategy;
lock (_locker) strategy = _strategy;
strategy.DoSomething();
}
public void ChangeStrategy()
{
Strategy strategy = new AnotherStrategy();
lock (_locker) _strategy = strategy;
}
Your code is more performant though, because the lock imposes a full fence, while the Volatile.Read imposes a potentially cheaper half fence.
You could improve the performance even more by replacing the Interlocked.Exchange (full fence) with a Volatile.Write (half fence). The only reason to prefer the Interlocked.Exchange over the Volatile.Write is when you want to retrieve the previous strategy as an atomic operation. Apparently this is not needed in your case.
For simplicity you could even get rid of the Volatile.Write/Volatile.Read calls, and just declare the _strategy field as volatile.
Here is the scenario. I've got a class that will be accessed by multiple threads (ASP.NET) that can benefit from storing a result in a write-once, read-many cache. This cached object is the result of an operation that cannot be performed as part of a static initializer, but must wait for the first execution. So I implement a simple null check as seen below. I'm aware that if two threads hit this check at the same moment I will have ExpensiveCalculation calculated twice, but that isn't the end of the world. My question is, do I need to worry about the static _cachedResult still being seen as null by other threads due to optimizations or other thread caching. Once written, the object is only ever read so I don't think full-scale locking is needed.
public class Bippi
{
private static ExpensiveCalculation _cachedResult;
public int DoSomething(Something arg)
{
// calculate only once. recalculating is not harmful, just wastes time.
if (_cachedResult == null);
_cachedResult = new ExpensiveCalculation(arg);
// additional work with both arg and the results of the precalculated
// values of _cachedResult.A, _cachedResult.B, and _cachedResult.C
int someResult = _cachedResult.A + _cachedResult.B + _cachedResult.C + arg.ChangableProp;
return someResult;
}
}
public class ExpensiveCalculation
{
public int A { get; private set; }
public int B { get; private set; }
public int C { get; private set; }
public ExpensiveCalculation(Something arg)
{
// arg is used to calculate A, B, and C
}
}
Additional notes, this is in a .NET 4.0 application.
My question is, do I need to worry about the static _cachedResult still being seen as null by other threads due to optimizations or other thread caching.
Yes, you do. That's one of the primary reasons volatile exists.
And it's worth mentioning that uncontested locks add an entirely negligible performance cost, so there's really no reason to just to just lock the null check and resource generation, as it's almost certainly not going to cause any performance problems, and makes the program much easier to reason about.
And the best solution is to avoid the question entirely and use a higher level of abstraction that is specifically designed to solve the exact problem that you have. In this case, that means Lazy. You can create a Lazy object that defines how to create your expensive resource, access it wherever you need the object, and the Lazy implementation becomes responsible for ensuring that the resource is created no more than once, and that it is properly exposed to the code asking for said resource, and that it is handled efficiently.
You need not need volatile, you - especially - need a memory barrier so that the processor caches synchronize.
I think you can altogether optimistically avoid locking, and yet avoid volatile performance penalties. You can test for nullability in a two-step fashion.
object readonly _cachedResultLock = new object();
...
if (_cachedResult == null)
{
lock(_cachedResultLock)
{
if (_cachedResult == null)
{
_cachedResult = new ExpensiveCalculation(arg);
}
}
}
Here most of the time you will not reach lock and will not serialize access. You may serialize access only on first access - but will guarantee that work is not wasted (though may cause another thread to wait a bit while first one finishes ExpensiveCalculation).
I have written a static class which is a repository of some functions which I am calling from different class.
public static class CommonStructures
{
public struct SendMailParameters
{
public string To { get; set; }
public string From { get; set; }
public string Subject { get; set; }
public string Body { get; set; }
public string Attachment { get; set; }
}
}
public static class CommonFunctions
{
private static readonly object LockObj = new object();
public static bool SendMail(SendMailParameters sendMailParam)
{
lock (LockObj)
{
try
{
//send mail
return true;
}
catch (Exception ex)
{
//some exception handling
return false;
}
}
}
private static readonly object LockObjCommonFunction2 = new object();
public static int CommonFunction2(int i)
{
lock (LockObjCommonFunction2)
{
int returnValue = 0;
try
{
//send operation
return returnValue;
}
catch (Exception ex)
{
//some exception handling
return returnValue;
}
}
}
}
Question 1: For my second method CommonFunction2, do I use a new static lock i.e. LockObjCommonFunction2 in this example or can I reuse the same lock object LockObj defined at the begining of the function.
Question 2: Is there anything which might lead to threading related issues or can I improve the code to be safe thread.
Quesiton 3: Can there be any issues in passing common class instead of struct.. in this example SendMailParameters( which i make use of wrapping up all parameters, instead of having multiple parameters to the SendMail function)?
Regards,
MH
Question 1: For my second method CommonFunction2, do I use a new
static lock i.e. LockObjCommonFunction2 in this example or can I reuse
the same lock object LockObj defined at the begining of the function.
If you want to synchronize these two methods, then you need to use the same lock for them. Example, if thread1 is accessing your Method1, and thread2 is accessing your Method2 and you want them to not concurrently access both insides, use the same lock. But, if you just want to restrict concurrent access to just either Method1 or 2, use different locks.
Question 2: Is there anything which might lead to threading related
issues or can I improve the code to be safe thread.
Always remember that shared resources (eg. static variables, files) are not thread-safe since they are easily accessed by all threads, thus you need to apply any kind of synchronization (via locks, signals, mutex, etc).
Quesiton 3: Can there be any issues in passing common class instead of
struct.. in this example SendMailParameters( which i make use of
wrapping up all parameters, instead of having multiple parameters to
the SendMail function)?
As long as you apply proper synchronizations, it would be thread-safe. For structs, look at this as a reference.
Bottomline is that you need to apply correct synchronizations for anything that in a shared memory. Also you should always take note of the scope the thread you are spawning and the state of the variables each method is using. Do they change the state or just depend on the internal state of the variable? Does the thread always create an object, although it's static/shared? If yes, then it should be thread-safe. Otherwise, if it just reuses that certain shared resource, then you should apply proper synchronization. And most of all, even without a shared resource, deadlocks could still happen, so remember the basic rules in C# to avoid deadlocks. P.S. thanks to Euphoric for sharing Eric Lippert's article.
But be careful with your synchronizations. As much as possible, limit their scopes to only where the shared resource is being modified. Because it could result to inconvenient bottlenecks to your application where performance will be greatly affected.
static readonly object _lock = new object();
static SomeClass sc = new SomeClass();
static void workerMethod()
{
//assuming this method is called by multiple threads
longProcessingMethod();
modifySharedResource(sc);
}
static void modifySharedResource(SomeClass sc)
{
//do something
lock (_lock)
{
//where sc is modified
}
}
static void longProcessingMethod()
{
//a long process
}
You can reuse the same lock object as many times as you like, but that means that none of the areas of code surrounded by that same lock can be accessed at the same time by various threads. So you need to plan accordingly, and carefully.
Sometimes it's better to use one lock object for multiple location, if there are multiple functions which edit the same array, for instance. Other times, more than one lock object is better, because even if one section of code is locked, the other can still run.
Multi-threaded coding is all about careful planning...
To be super duper safe, at the expense of potentially writing much slower code... you can add an accessor to your static class surround by a lock. That way you can make sure that none of the methods of that class will ever be called by two threads at the same time. It's pretty brute force, and definitely a 'no-no' for professionals. But if you're just getting familiar with how these things work, it's not a bad place to start learning.
1) As to first it depends on what you want to have:
As is (two separate lock objects) - no two threads will execute the same method at the same time but they can execute different methods at the same time.
If you change to have single lock object then no two threads will execute those sections under shared locking object.
2) In your snippet there is nothing that strikes me as wrong - but there is not much of code. If your repository calls methods from itself then you can have a problem and there is a world of issues that you can run into :)
3) As to structs I would not use them. Use classes it is better/easier that way there is another bag of issues related with structs you just don't need those problems.
The number of lock objects to use depends on what kind of data you're trying to protect. If you have several variables that are read/updated on multiple threads, you should use a separate lock object for each independent variable. So if you have 10 variables that form 6 independent variable groups (as far as how you intend to read / write them), you should use 6 lock objects for best performance. (An independent variable is one that's read / written on multiple threads without affecting the value of other variables. If 2 variables must be read together for a given action, they're dependent on each other so they'd have to be locked together. I hope this is not too confusing.)
Locked regions should be as short as possible for maximum performance - every time you lock a region of code, no other thread can enter that region until the lock is released. If you have a number of independent variables but use too few lock objects, your performance will suffer because your locked regions will grow longer.
Having more lock objects allows for higher parallelism since each thread can read / write a different independent variable - threads will only have to wait on each other if they try to read / write variables that are dependent on each other (and thus are locked through the same lock object).
In your code you must be careful with your SendMailParameters input parameter - if this is a reference type (class, not struct) you must make sure that its properties are locked or that it isn't accessed on multiple threads. If it's a reference type, it's just a pointer and without locking inside its property getters / setters, multiple threads may attempt to read / write some properties of the same instance. If this happens, your SendMail() function may end up using a corrupted instance. It's not enough to simply have a lock inside SendMail() - properties and methods of SendMailParameters must be protected as well.
Suppose I have a static helper class that I'm using a lot in a web app. Suppose that the app receives about 20 requests per second for a sustained period of time and that, by magic, two requests ask the static class to do some work at the exact same nanosecond.
What happens when this happens?
To provide some context, the class is a used to perform a linq-to-sql query: it receives a few parameters, including the UserID, and returns a list of custom objects.
thanks.
It entirely depends on what your "some work" means. If it doesn't involve any shared state, it's absolutely fine. If it requires access to shared state, you'll need work out how to handle that in a thread-safe way.
A general rule of thumb is that a class's public API should be thread-safe for static methods, but doesn't have to be thread-safe for instance methods - typically any one instance is only used within a single thread. Of course it depends on what your class is doing, and what you mean by thread-safe.
What happens when this happens?
If your methods are reentrant then they are thread safe and what will happen is that chances are they will work. If those static methods rely on some shared state and you haven't synchronized access to this state chances are this shared state will get corrupted. But you don't need to hit the method at the same nanosecond by 20 requests to corrupt your shared state. 2 suffice largely if you don't synchronize it.
So static methods by themselves are not evil (well actually they are as they are not unit test friendly but that's another topic), it's the way they are implemented that matters in a multithreaded environment. So you should make them thread safe.
UPDATE:
Because in the comments section you mentioned LINQ-TO-SQL as long as all variables used in the static method are local, this method is thread-safe. For example:
public static SomeEntity GetEntity(int id)
{
using (var db = new SomeDbContext())
{
return db.SomeEntities.FirstOrDefault(x => x.Id == id);
}
}
you must ensure your methods are thread safe, so don't use static attributes to store any kind of state. If you are declaring new objects inside the static method, there is no problem because each thread have its own object.
It depends if the static class has any state or not (i.e. static variables shared across all calls). If it does not, then it's fine. If it does, it's not good. Examples:
// Fine
static class Whatever
{
public string DoSomething() {
return "something";
}
}
// Death from above
static class WhateverUnsafe
{
static int count = 0;
public int Count() {
return ++count;
}
}
You can make the second work fine using locks, but then you introduce deadlocks and concurrency issues.
I have built massive web applications with static classes but they never have any shared state.
It crashes out in a nasty way (if you are doing this to share state), avoid doing this in a webapp... Or alternativly protect the reads/writes with a lock:
http://msdn.microsoft.com/en-us/library/system.threading.readerwriterlockslim.aspx
But honestly you really should avoid using statics, unless you REALLY have to, and if you really have to you have to be very careful with your locking strategy and test it to destruction to make sure have managed to isolated reads and writes from each other
UPDATED: now using a read-only collection based on comments below
I believe that the following code should be thread safe "lock free" code, but want to make sure I'm not missing something...
public class ViewModel : INotifyPropertyChanged
{
//INotifyPropertyChanged and other boring stuff goes here...
private volatile List<string> _data;
public IEnumerable<string> Data
{
get { return _data; }
}
//this function is called on a timer and runs on a background thread
private void RefreshData()
{
List<string> newData = ACallToAService();
_data = newData.AsReadOnly();
OnPropertyChanged("Data"); // yes, this dispatches the to UI thread
}
}
Specifically, I know that I could use a lock(_lock) or even an Interlocked.Exchange() but I don't believe that there is a need for it in this case. The volatile keyword should be sufficient (to make sure the value isn't cached), no? Can someone please confirm this, or else let me know what I don't understand about threading :)
I have no idea whether that is "safe" or not; it depends on precisely what you mean by "safe". For example, if you define "safe" as "a consistent ordering of all volatile writes is guaranteed to be observed from all threads", then your program is not guaranteed to be "safe" on all hardware.
The best practice here is to use a lock unless you have an extremely good reason not to. What is your extremely good reason to write this risky code?
UPDATE: My point is that low-lock or no-lock code is extremely risky and that only a small number of people in the world actually understand it. Let me give you an example, from Joe Duffy:
// deeply broken, do not use!
class Singleton {
private static object slock = new object();
private static Singleton instance;
private static bool initialized;
private Singleton() {}
public Instance {
get {
if (!initialized) {
lock (slock) {
if (!initialized) {
instance = new Singleton();
initialized = true;
}
}
}
return instance;
}
}
}
This code is broken; it is perfectly legal for a correct implementation of the C# compiler to write you a program that returns null for the instance. Can you see how? If not, then you have no business doing low-lock or no-lock programming; you will get it wrong.
I can't figure out this stuff myself; it breaks my brain. That's why I try to never do low-lock programming that departs in any way from standard practices that have been analyzed by experts.
It depends on what the intent is. The get/set of the list is atomic (even without volatile) and non-cached (volatile), but callers can mutate the list, which is not guaranteed thread-safe.
There is also a race condition that could lose data:
obj.Data.Add(value);
Here value could easily be discarded.
I would use an immutable (read-only) collection.
I think that if you have only two threads like you described, your code is correct and safe. And also you don't need that volatile, it is useless here.
But please don't call it "thread safe", as it is safe only for your two threads using it your special way.
I believe that this is safe in itself (even without volatile), however there may be issues depending on how other threads use the Data property.
Provided that you can guarantee that all other threads read and cache the value of Data once before doing enumeration on it (and don't try to cast it to some broader interface to perform other operations), and make no consistency assumptions for a second access to the property, then you should be ok. If you can't make that guarantee (and it'd be hard to make that guarantee if eg. one of the users is the framework itself via data-binding, and hence code that you do not control), then you can't say that it's safe.
For example, this would be safe:
foreach (var item in x.Data)
{
// do something with item
}
And this would be safe (provided that the JIT isn't allowed to optimise away the local, which I think is the case):
var data = x.Data;
var item1 = FindItem(data, a);
var item2 = FindItem(data, b);
DoSomething(item1, item2);
The above two might act on stale data, but it will always be consistent data. But this would not necessarily be safe:
var item1 = FindItem(x.Data, a);
var item2 = FindItem(x.Data, b);
DoSomething(item1, item2);
This one could possibly be searching two different states of the collection (before and after some thread replaces it), so it may not be safe to operate on items found in each separate enumeration, as they may not be consistent with each other.
The issue would be worse with a broader interface; eg. if Data exposed IList<T> you'd have to watch for consistency of Count and indexer operations as well.