I'm going to implement non-blocking write to a variable via Volatile.Write. Should i use Volatile.Read for all consumers of this variable, or it is not necessary? What kind of impacts may occure if i read this variable as usual (without any kind of barriers)? And the same question about Interlocked.Exchange
From the documentation of the Volatile class:
Calling one of these methods affects only a single memory access. To provide effective synchronization for a field, all access to the field must use Volatile.Read and Volatile.Write.
One of the things that may go wrong is that the compiler may emit code that reads the value of the variable into a register just once, and then keeps accessing this cached copy forever after, without ever checking to see whether the original value has changed.
Same thing with Interlocked.Exchange.
Generally, the best way to handle these kinds of situations is to fully encapsulate your variable inside a class exposing a property which accesses the variable via Volatile or Interlocked, thus guaranteeing that the variable will never be accessed by any other means.
Related
Some newbie questions about multi-threading in .NET which I think will help reinforce some concepts I'm trying to absorb - I've read several multi-threading material (including the Albahari ebook) but feel I just need some confirmation of some questions to help drive these concepts home
A lock scope protects a shared region of code - suppose there is a thread executing a method that increments a simple integer variable x in a loop - however this won't protect code elsewhere that might also alter variable x eg in another method on another thread ...
Since this is two different regions of code potentially affecting the same variable, do we solve this by locking both regions of code using the same lock variable for both lock scopes around variable x? If you locked both regions of code with different lock variables, this would not protect the variable correct?
To further this example, using the same lock variable, what would happen if for some reason, code in one method went into some infinite loop and never relinquished the lock variable - how could the second region of code in the other method detect this?
How does the choice of lock variable influence the behavior of the lock? I've read numerous posts on this subject already but can never seem to find a definitive answer - in some instances people explicitly use an object variable specifically for this purpose, other times people use lock(this) and finally there've been times I've seen people use a type object.
How do the different choices of lock variables influence the behavior / scope of the lock and what scenarios would it make sense to use one over the other?
suppose you have a hashtable wrapped in a class exposing add, remove, get and some sort of Calculate method (say each object represents a quantity and this method sums each value) and all these methods are locked - however, once a reference to an object in that collection is made available to other code and passed around an application, this object (not the hashtable) would now be outside the lock scope surrounding the methods of that class ..how could you then protect access / updates to those actual objects taken from the hashtable, which could interfere with the Calculate method?
Appreciate any heuristics provided that would help reinforce these concepts for me - thanks!
1) Yes
2) That's a deadlock
3) The parts of your code you want to block are an implementation detail of your class. Exposing the lock object by using lock(this) or lock(this.GetType()) is asking for trouble since now external code can lock the same object and block your code unintentionally or maliciously. The lock object should be private.
4) It isn't very clear what you mean, you certainly wouldn't want to expose the Hashtable directly. Just keep it as a private field of the class, encapsulating it.
However, the odds that you can safely expose your class to client code using threads go down very rapidly with the number of public methods and properties you expose. You'll quickly get to a point where only the client code can properly take a lock. Fine-grained locking creates lots of opportunities for threading races when the client code is holding on to property values. Say a Count property value you return. By the time it uses the value, like in a for loop, the Count property might have changed. Only the most careful design can avoid these traps, a serious headache.
Furthermore, fine-grained locking is very inefficient since it inevitably is done in the most inner parts of your code. Locks are not that expensive, a rough 100 cpu cycles, but it quickly adds up. Especially wasted effort if the class object isn't actually used in multiple threads.
You then have no option but to declare your class thread-unsafe and the client code needs to use it in a thread-safe manner. Also the core reason that so many .NET classes are not thread-safe. This is the biggest reason that threading is so hard to get right, the programmer least likely to do it correctly is responsible for doing the most difficult thing.
1)
You are correct. You must use the same lock object to protect two distinct area's of code that for example increment the variable x.
2)
This is known as a deadlock and is one of the difficulties with multithreaded programming. There are algorithms which can be used to prevent deadlocks such as the Bankers Algorithm.
3)
Some languages make locking easy, for example in .Net you can just create an object and use it as the shared lock. This is good for synchronising code within a given process. Lock(this) just applies the lock to the object in question. However try to avoid this, instead create a private object and use that. Lock(this) can lead to deadlocking situations. The lock object underneath is probably just a wrapper around a Critical Section. If you wanted to protect a resource across different processes you would need a much heavier named Mutex, this requires a lock on a kernel object and is expensive, so do not use unless you must.
4)You need to make sure locking is applied there as well. But surely when people call methods on this reference they call the methods which employ synchronisation.
I got confused on volatile for reference type .
I understand that for primitive type, volatile can reflect value changes from another thread immediately. For reference type, it can reflect the address changes immediately. However, what about the content of the object. Are they still cached?
(Assuming List.Add() is an atomic operation)
For example, I have:
class A
{
volatile List<String> list;
void AddValue()
{
list.Add("a value");
}
}
If one thread calls the function AddValue, the address of list does not change, will another thread get updated about the "content" change of the list, or the content may be cached for each thread and it doesn't update for other threads?
I understand that for primitive type, volatile can reflect value changes from another thread immediately
You understand incorrectly in at least three ways. You should not attempt to use volatile until you deeply understand everything about weak memory models, acquire and release semantics, and how they affect your program.
First off, be clear that volatile affects variables, not values.
Second, volatile does not affect variables that contain values of value types any differently than it affects variables that contain references.
Third, volatile does not mean that value changes from other threads are visible immediately. Volatile means that variables have acquire and release semantics. Volatile affects the order in which side effects of memory mutations can be observed to happen from a particular thread. The idea that there exists a consistent universal order of mutations and that those mutations in that order can be observed instantaneously from all threads is not a guarantee made by the memory model.
However, what about the content of the object?
What about it? The storage location referred to by a volatile variable of reference type need not have any particular threading characteristics.
If one thread calls the function AddValue, the address of list does not change, will another thread get updated about the "content" change of the list.
Nope. Why would it? That other thread might be on a different processor, and that processor cache might have pre-loaded the page that contains the address of the array that is backing the list. Mutating the list might have changed the storage location that contains the address of the array to refer to some completely different location.
Of course, the list class is not threadsafe in the first place. If you're not locking access to the list then the list can simply crash and die when you try to do this.
You don't need volatile; what you need is to put thread locks around accesses to the list. Since thread locks induce full fences you should not need half fences introduced by volatile.
It's worse than that.
If you concurrently access an object that isn't thread-safe, your program may actually crash. Getting out-of-date information is not the worst potential outcome.
When sharing .NET base class library objects between threads, you really have no choice but to use locking. For lockless programming, you need invasive changes to your data structures at the lowest levels.
The volatile keyword has no impact on the content of the list (or, more precisely, the object being referenced).
Speaking about updated / not updated for another thread is an oversimplification of what's happening. You should use the lock statement to synchronize access to the shared list. Otherwise you are effectively facing race conditions that may lead to program crash. The List<T> class is not thread-safe by itself.
Look at http://www.albahari.com/threading/part4.aspx#_The_volatile_keyword for a good explanation about what volatile actually does and how it impacts fields.
The entire part of threading on that site is a must read anyway, it contains huge amounts of useful information that have proved very useful for me when I was designing multi threaded software.
Hi
I am working on a simple desktop application, it needs to handle some operations like loading a webpage which may block the main thread, so i moved the code to a background worker.
My problem is there is a heavy class named UCSProject, which contains many string and List fields, i need to pass an instance of this class to the background worker, since the class is a bit heavy, i would like to reduce the number of duplicate instances by using the global variable directly, instead of passing it as an argument to the background worker.
To make it short, I just want to know is it safe to access global variables from a background worker thread in C#
It is safe unless until both your threads(background & normal) not modifying the object.
If you want your object to be modified by each other, use Lock
By your question I suspect that you do not understand how variables to classes work. You do not need a global variable to only have one copy of your object. All variables will point on exactly the same object unless you Clone it or create a new one with the old one as prototype.
A global variable will in other words change nothing unless you explicitly create new copies as described in the previous paragraph.
I do also wonder how heavy your class can be if you think that the performance would be hurt by creating copies of it? How many mb do it weight?
Update
This article series describes in great detail what the heap and stack is: http://www.c-sharpcorner.com/uploadfile/rmcochran/csharp_memory01122006130034pm/csharp_memory.aspx
It is safe, but you have to synchronize access to the variables, e.g. by using the lock statement.
See "lock Statement" in the MSDN library.
No, it is not, unless you're locking it with lock(object) { } whenever you use any of its data fields.
If you're not modifying any of the strings or variables then you don't need to lock.
I'd also consider making this a static class if the data is shared across the whole application -- then you won't need to pass an instance.
If you need modify or update the data -- use Lock.
You may also use [ThreadStatic]. The value of the variable will be unique for each thread. See MSDN for how to use it.
I discovered ThreadStaticAttribute, and I have a lot of questions about it:
all my previous thread-dependent static information was implemented as a static dictionary in which TKey is Thread, and when I wanted to access it, I used Thread.CurrentThread and that works. But this requires maintenance because if a thread dies, I have to delete the corresponding entry from the dictionary. And I also need to consider thread safety and a lot of other matters.
By using ThreadStaticAttribute, all these matters seem to be solved, but I need to be sure of it. My questions are: do I need to delete the instance held by ThreadStaticAttribute marked fields, somehow, before the thread dies?? Where is the information on that field held?? It is in the instance of a Thread object, or something like that, so that when it is not used anymore, the garbage collector automatically discards it? Are there performance penalties? What ones? Is it faster than using a Keyed collection like I was doing?
I need clarification on how ThreadStaticAttribute works.
No you do not need to delete instances of values help in a field which is tagged with ThreadStatic. The garbage collector will automatically pick them up when both the thread and the object are no longer reachable by rooted objects.
The only exception here is if the value implements IDisposable and you want to actively dispose of it. In general this is a hard problem to solve for a number of reasons. It's much simpler to not have values which implement IDisposable and are in a ThreadStatic field.
As to where this field is actually stored, it's somewhat irrelevant. All you need to be concerned about is that it will behave like any other object in .Net. The only two behavior differences are
The field will reference a different value per accessing thread.
The initializer for the field will only be run once (in practice, it's a bad idea to have any).
Marking a static member variable as [ThreadStatic] tells the compiler to allocate it in the thread's memory area (eg. where the thread's stack is allocated) rather than in the global memory area. Thus, each thread will have its own copy (which are guaranteed to be initialized to the default value for that type, eg. null, 0, false, etc; do not use in-line initializers as they will only initialize it for one thread).
So, when the thread goes away, so does its memory area, releasing the reference. Of course if it's something that needs more immediate disposal (open file streams, etc) instead of waiting for background garbage collection, you might want to make sure you do that before the thread exits.
There could be a limit to the amount of [ThreadStatic] space available, but it should be sufficient for sane uses. It should be somewhat faster than accessing a keyed collection (and more easily thread-safe), and I think it's comparable to accessing a normal static variable.
Correction: I have since heard that accessing ThreadStatic variables is somewhat slower than accessing normal static variables. I'm not sure if it is even actually faster than accessing a keyed collection, but it does avoid issues of orphans (which was your question) and needing locking for threadsafety which would complicate a keyed-collection approach.
Is it necessary to acquire a lock on a variable before reading it from multiple threads?
The short answer is: it depends.
The long answer is:
If it is not a shared value, i.e, only one thread can see it (or use it), you don't need any synchronization.
If it is an immutable value, i.e., you set it only once and then only ever read, it is safe to do so without synchronization (as long as you don't start reading before the first write completes).
If it is a "primitive" type of at most 32-bits (e.g. byte, short, int) you can get stale (old) data when reading. If that doesn't bother you, you're set. If stale data is undesirable, making the variable volatile can fix this problem without additional synchronization for reads. But if you have racing writers, you will need to follow the same advice as for longs below.
If it is a "primitive" type longer than 32-bits (e.g. long, decimal, double) you need synchronization, otherwise you could read "half" of one value, "half" of another, and get crazy results. For this the recommended approach is to use the methods in the Interlocked class, for both reads and writes..
If it is a reference type, you will need synchronization to avoid seeing an invalid state (Jeff Lamb's picture example is a good one). The lock statement might be enough for that. Again, you need to lock for both reads and writes.
There are some other points to consider (how long to lock, for example), but I think these are enough to answer your question.
It depends on the type of variable and your platform. For example, reading Int64s is not guaranteed to be atomic on 32 bit machines. Hence, Interlocked.Read.
If the loading of the value is done in 1 assembly instruction, it's not necessary to get a lock. You don't care if the value changed 10 minutes ago or 1 microsecond ago. You just want the value now.
However, if you're loading a HUGE array or picture or something, it'd probably be a good idea to lock it out. In theory, you can get preempted while loading the data and have half of the first item and half of the second item.
If it's a simple variable, though, like a bool or int, it's not necessary.
In adition to the answers below you can also do a read lock using the ReadWriterLockSlim.
That would allow you to do only a read lock when reading and a write lock when modifying your variable. Multiple threads can have a read lock at the same time but as soon as a thread requests a write lock all new request are blocked until it is complete.
This sort of locking would be usefull if you are doing alot of reads and not many writes.
As with most multithreading issues, research it enough to understand if it really fits your problem the ReadWriterLock would not be suitable for every locking situation.
It depends on whether or not is it a local or shared variable, and whether something else may write to it in the meantime, and what you're going to do after reading it.
If you make a decision based on the variable, consider that the next line of code may then be based on data which is now stale.
Answer is it depends. If the value of the variable does not change when the threads are accessing the variable. otherwise, its needed.
Also, You can use Interlocked.XXX series for maintaining atomicity in reading\writing the variable .
Reading does not require a lock; as long as you don't care about the 'correctness' of the read. It is only dangerous if you attempt to write without a lock.
If it is a constant, no.
If it is an updatable value, yes, if you need consistency.
For updatable values where the exact value must be managed, then yes, you should use a lock or other synchronization method for reads and writes; and perhaps block during the entire scope where the value is used.
It is 100% necessary unless you are 100% sure that the variable's value won't change while the reader threads are running.
Necessary? No.
...but if it's possible that another thread could try to write to it during the read (like a collection, etc.) then it might be a good idea.
As long as it doesn't change during others threads execution you don't need to lock it.
If change, you should use it.
If the variable is never written to by someone (at least at the time it is accessible), you don't need to lock it, because there are no possibilities for missed updates. The same goes if you don't care about missed updates (meaning it is not a problem if you get an older value). Otherwise you should use some sort of synchronization