.NET 6 now has PriorityQueue<TElement,TPriority> which is very useful. The document is not very clear yet (granted at the time of the question the documentation is still for RC1) if it is thread-safe or not. Two things:
It resides in System.Collections.Generic and there doesn't seem to be an equivalent in System.Collections.Concurrent.
It does have method named TryDequeue and TryPeek. Granted they probably are just methods that do not throw exception when the queue is empty but it does give an impression of the concurrent collections.
Can I use it for multithreaded environment without wrapping/locking (for example in an ASP.NET Core website)? Any concurrent equivalent that I am not aware of (I try not to use 3rd-party package if possible)?
With a look at the source code for PriorityQueue.Enqueue, for instance, it is immediately apparent that the code is not thread-safe:
public void Enqueue(TElement element, TPriority priority)
{
// Virtually add the node at the end of the underlying array.
// Note that the node being enqueued does not need to be physically placed
// there at this point, as such an assignment would be redundant.
int currentSize = _size++; // <-- BOOM
The document is not very clear yet
It actually is. Anything in .NET is NOT thread safe unless it is EXPLICITLY mentioned in the documentation. Period.
Thread safety comes with a (significant) performance overhead, particularly when done generically (i.e. not assuming specific uses). As such, it would be extremely stupid to make everything thread safe "just in case". Hence the general concept in .NET (since back in 1.0) that NOTHING is thread safe unless it is explicitly mentioned in the documentation.
As you say, the documentation has no mention of thread safety. As such, it is extremely clear on NOT being thread safe.
Related
I've been wondering recently how lock (or more specific: Monitor) works internally in .NET with regards to the objects that are locked. Specifically, I'm wondering what the overhead is, if there are 'global' (Process) locks used, if it's possible to create more of those global locks if that's the case (for groups of monitors) and what happens to the objects that are passed to lock (they don't seem to introduce an extra memory overhead).
To clarify what I'm not asking about: I'm not asking here about what a Monitor is (I made one myself at University some time ago). I'm also not asking how to use lock, Monitor, how they compile to a try/finally, etc; I'm pretty well aware of that (and there are other SO questions related to that). This is about the inner workings of Monitor.Enter and Monitor.Exit.
For example, consider this code executed by ten threads:
for (int i=0; i<1000; ++i)
{
lock (myArray[i])
{
// ...
}
}
Is it bad to lock a thousand objects instead of one? What is impact in terms of performance / memory pressure?
The underlying monitor creates a wait queue. Is it possible to have more than one wait queue and how would I create that?
Monitor.Enter is not a normal .NET method (can't be decompiled with ILSpy or similar). The method is implemented internally by the CLR, so strictly speaking, there is no one answer for .NET as different runtimes can have different implementations.
All objects in .NET have an object header containing a pointer to the type of the object, but also an SyncBlock index into a SyncTableEntry. Normally that index is zero/non used, but when you lock on the object it will contain an index into the SyncTableEntry which then contains the reference to the actual lock object.
So locking of thousands of objects will indeed create a lot of locks which is an overhead.
The information I found was in this MSDN article: http://msdn.microsoft.com/en-us/magazine/cc163791.aspx
Here's a good place to read about monitors, memory barriers etc.
EDIT
Screen shot from the page in case page become down in future:
I'm using such configuration:
.NET framework 4.5
Windows Server 2008 R2
HP DL360p Gen8 (2 * Xeon E5-2640, x64)
I have such field somewhere in my program:
protected int HedgeVolume;
I access this field from several threads. I assume that as I have multi-processor system it's possible that this threads are executing on different processors.
What should I do to guarantee that any time I use this field the most recent value is "read"? And to make sure that when I "write" value it become available to all other threads immediately?
What should I do?
just leave field as is.
declare it volatile
use Interlocked class to access the field
use .NET 4.5 Volatile.Read, Volatile.Write methods to access the field
use lock
I only need simplest way to make my program work on this configuration I don't need my program to work on another computers or servers or operation systems. Also I want minimal latency so I'm looking for fastest solution that will always work on this standard configuration (multiprocessor intel x64, .net 4.5).
Your question is missing one key element... How important is the integrity of the data in that field?
volatile gives you performance, but if a thread is currently writing changes to the field, you won't get that data until it's done, so you might access out of date information, and potentially overwrite changes another thread is currently doing. If the data is sensitive, you might get bugs that would get very hard to track. However, if you are doing very quick update, overwrite the value without reading it and don't care that once in a while you get outdated (by a few ms) data, go for it.
lock guaranty that only one thread can access the field at a time. You can put it only on the methods that write the field and leave the reading method alone. The down side is, it is slow, and may block a thread while another is performing its task. However, you are sure your data stay valid.
Interlock exist to shield yourself from the scheduler context switch. My opinion? Don't use it unless you know exactly why you would be using it and exactly how to use it. It gives options, but with great options comes great problematic. It prevents a context switch while a variable is being update. It might not do what you think it does and won't prevent parallel threads from performing their tasks simultaneously.
You want to use Volatile.Read().
As you are running on x86, all writes in C# are the equivalent of Volatile.Write(), you only need to use this for Itanium.
Volatile.Read() will ensure that you get the latest copy regardless of which thread last wrote it.
There is a fantastic write up here, C# Memory Model Explained
Summary of it includes,
On some processors, not only must the compiler avoid certain
optimizations on volatile reads and writes, it also has to use special
instructions. On a multi-core machine, different cores have different
caches. The processors may not bother to keep those caches coherent by
default, and special instructions may be needed to flush and refresh
the caches.
Hopefully that much is obvious, other than the need for volatile to stop the compiler from optimising it, there is the processor as well.
However, in C# all writes are volatile (unlike say in Java),
regardless of whether you write to a volatile or a non-volatile field.
So, the above situation actually never happens in C#. A volatile write
updates the thread’s cache, and then flushes the entire cache to main
memory.
You do not need Volatile.Write(). More authoratitive source here, Joe Duffy CLR Memory Model. However, you may need it to stop the compiler reordering it.
Since all C# writes are volatile, you can think of all writes as going
straight to main memory. A regular, non-volatile read can read the
value from the thread’s cache, rather than from main
You need Volatile.Read()
When you start designing a concurrent program, you should consider these options in order of preference:
1) Isolation: each thread has it's own private data
2) Immutability: threads can see shared state, but it never changes
3) Mutable shared state: protect all access to shared state with locks
If you get to (3), then how fast do you actually need this to be?
Acquiring an uncontested lock takes in the order of 10ns ( 10-8 seconds ) - that's fast enough for most applications and is the easiest way to guarantee correctness.
Using any of the other options you mention takes you into the realm of low-lock programming, which is insanely difficult to get correct.
If you want to learn how to write concurrent software, you should read these:
Intro: Joe Albahari's free e-book - will take about a day to read
Bible: Joe Duffy's "Concurrent Programming on Windows" - will take about a month to read
Depends what you DO. For reading only, volatile is easiest, interlocked allows a little more control. Lock is unnecessary as it is more ganular than the problem you describe. Not sure about Volatile.Read/Write, never used them.
volatile - bad, there are some issues (see Joe Duffy's blog)
if all you do is read the value or unconditionally write a value - use Volatile.Read and Volatile.Write
if you need to read and subsequently write an updated value - use the lock syntax. You can however achieve the same effect without lock using the Interlocked classes functionality, but this is more complex (involves CompareExchange s to ensure that you are updating the read value i.e. has not been modified since the read operation + logic to retry if the value was modified since the read).
From this i can understand that you want to be able to read the last value that it was writtent in a field. Lets make an analogy with the sql concurency problem of the data. If you want to be able to read the last value of a field you must make atomic instructions. If someone is writing a field all of the threads must be locked for reading until that thread finished the writing transaction. After that every read on that thread will be safe. The problem is not with reading as it is with writing. A lock on that field whenever its writtent should be enough if you ask me ...
First have a look here: Volatile vs. Interlocked vs. lock
The volatile modifier shurely is a good option for a multikernel cpu.
But is this enough? It depends on how you calculate the new HedgeVolume value!
If your new HedgeVolume does not depend on current HedgeVolume then your done with volatile.
But if HedgeVolume[x] = f(HedgeVolume[x-1]) then you need some thread synchronisation to guarantee that HedgeVolume doesn't change while you calculate and assign the new value. Both, lock and Interlocked szenarios would be suitable in this case.
I had a similar question and found this article to be extremely helpful. It's a very long read, but I learned a LOT!
Some newbie questions about multi-threading in .NET which I think will help reinforce some concepts I'm trying to absorb - I've read several multi-threading material (including the Albahari ebook) but feel I just need some confirmation of some questions to help drive these concepts home
A lock scope protects a shared region of code - suppose there is a thread executing a method that increments a simple integer variable x in a loop - however this won't protect code elsewhere that might also alter variable x eg in another method on another thread ...
Since this is two different regions of code potentially affecting the same variable, do we solve this by locking both regions of code using the same lock variable for both lock scopes around variable x? If you locked both regions of code with different lock variables, this would not protect the variable correct?
To further this example, using the same lock variable, what would happen if for some reason, code in one method went into some infinite loop and never relinquished the lock variable - how could the second region of code in the other method detect this?
How does the choice of lock variable influence the behavior of the lock? I've read numerous posts on this subject already but can never seem to find a definitive answer - in some instances people explicitly use an object variable specifically for this purpose, other times people use lock(this) and finally there've been times I've seen people use a type object.
How do the different choices of lock variables influence the behavior / scope of the lock and what scenarios would it make sense to use one over the other?
suppose you have a hashtable wrapped in a class exposing add, remove, get and some sort of Calculate method (say each object represents a quantity and this method sums each value) and all these methods are locked - however, once a reference to an object in that collection is made available to other code and passed around an application, this object (not the hashtable) would now be outside the lock scope surrounding the methods of that class ..how could you then protect access / updates to those actual objects taken from the hashtable, which could interfere with the Calculate method?
Appreciate any heuristics provided that would help reinforce these concepts for me - thanks!
1) Yes
2) That's a deadlock
3) The parts of your code you want to block are an implementation detail of your class. Exposing the lock object by using lock(this) or lock(this.GetType()) is asking for trouble since now external code can lock the same object and block your code unintentionally or maliciously. The lock object should be private.
4) It isn't very clear what you mean, you certainly wouldn't want to expose the Hashtable directly. Just keep it as a private field of the class, encapsulating it.
However, the odds that you can safely expose your class to client code using threads go down very rapidly with the number of public methods and properties you expose. You'll quickly get to a point where only the client code can properly take a lock. Fine-grained locking creates lots of opportunities for threading races when the client code is holding on to property values. Say a Count property value you return. By the time it uses the value, like in a for loop, the Count property might have changed. Only the most careful design can avoid these traps, a serious headache.
Furthermore, fine-grained locking is very inefficient since it inevitably is done in the most inner parts of your code. Locks are not that expensive, a rough 100 cpu cycles, but it quickly adds up. Especially wasted effort if the class object isn't actually used in multiple threads.
You then have no option but to declare your class thread-unsafe and the client code needs to use it in a thread-safe manner. Also the core reason that so many .NET classes are not thread-safe. This is the biggest reason that threading is so hard to get right, the programmer least likely to do it correctly is responsible for doing the most difficult thing.
1)
You are correct. You must use the same lock object to protect two distinct area's of code that for example increment the variable x.
2)
This is known as a deadlock and is one of the difficulties with multithreaded programming. There are algorithms which can be used to prevent deadlocks such as the Bankers Algorithm.
3)
Some languages make locking easy, for example in .Net you can just create an object and use it as the shared lock. This is good for synchronising code within a given process. Lock(this) just applies the lock to the object in question. However try to avoid this, instead create a private object and use that. Lock(this) can lead to deadlocking situations. The lock object underneath is probably just a wrapper around a Critical Section. If you wanted to protect a resource across different processes you would need a much heavier named Mutex, this requires a lock on a kernel object and is expensive, so do not use unless you must.
4)You need to make sure locking is applied there as well. But surely when people call methods on this reference they call the methods which employ synchronisation.
When I said atomic, I meant set of instructions will execute without any context switching to another thread on the same process (other kinds of switches have to be done of course). The only solution I came up with is to suspend all threads except currently executed before part and resume them after it. Any more elegant way?
The reason I want to do that is to collect a coherent state of objects running on multiple threads. However, their code cannot be changed (they're already compiled), so I cannot insert mutexes, semaphores, etc in it. The atomic operation is of course state collecting (i.e. copying some variables).
There are some atomic operations in the Interlocked class but it only provides a few very simple operations. It can't be used to create an entire atomic block of code.
I'd advise using locking carefully to make sure that your code will still work even if the context changes.
Well, you can use locks, but you can't prevent context switching exactly. But if your threads lock on the same object, then the threads waiting obviously won't be running, so there's no context switching involved since there's nothing to run.
You might want to look at this page too.
No. You can surround a block of code with a Monitor to make it thread-safe, but you cannot make general code snippets atomic.
object lck = new object();
lock(lck)
{
// thread safe code goes in here
}
No, that's against multi-tasking.
Unless very simple operations like incrementing ... which are not subject of your question.
It is possible to obtain a global state from a shared memory composed of a collection (array) of atomic one reader/multi writer registers. The solution is simple but not trivial. You can read the algorithm published in the paper "atomic snapshots of shared memory" or you can read the chapter 4 from the art of multiprocesor programming book, there you can get ideas on the implementation on the java language, of course, once you are familiarized with the idea you should be able to transport it to any other language. Sorry if my english is not well enough.
This question got me thinking about the .NET equivalent. What value is there to the ThreadState property of the Thread class? In this code example:
if (someThread.ThreadState != System.Threading.ThreadState.Running)
{
someThread = new Thread(SomeMethod);
someThread.Start();
}
The someThread's ThreadState property could switch to Running between the if and the code inside the if, right?
ThreadState is one of those fantastic properties that look promising at first, but once you dive deep into the way in which in functions, you find that it's almost totally useless.
The main problem with ThreadState is the enumeration names are very misleading. For instance take ThreadState.Runnning. This is a really bad name because it does not actually indicate the thread is running. Instead it indicates the thread was running at some point in the recent past and may or may not still be running.
This may seem trivial but it's not. It's really easy to take the names of the enumeration literally and produces really nice looking code. However as well as the code reads, it's often based on flawed logic.
I really only use this property for debugging purposes.
This value can be useful in very limited sets of scenarios where you have some other mechanism which controls the thread you are looking at. Such as a lock or WaitHandle. But it's usually better to use another form of synchronization than relying on this property.
From MSDN:
Important Note:
There are two thread state enumerations,
System.Threading.ThreadState and
System.Diagnostics.ThreadState.
The thread state enumerations are only of interest in a few debugging
scenarios.
Your code should never use thread state to synchronize the activities of
threads.
ThreadState is useful for looking at in the debugger, in order to help understand and debug certain types of blocking/synchronization bugs. For example, you can tell if a specific thread is blocked in the debugger by looking at this, and seeing the ThreadState set to ThreadState.WaitSleepJoin.
That being said, it's something I almost never rely upon. I've had mixed results trying to debug using this, so in general, I think it's really most often best to pretend that it doesn't exist.