different thread accessing MemoryStream

different thread accessing MemoryStream - c#

There's a bit of code which writes data to a MemoryStream object directly into it's data buffer by calling GetBuffer(). It also uses and updates the Position and SetLength() properties appropriately.
This code works properly 99.9999% of the time. Literally. Only every so many 100,000's of iterations it will barf. The specific problem is that the Position property of MemoryStream suddenly returns zero instead of the appropriate value.
However, code was added that checks for the 0 and throws an exception which includes log of the MemoryStream properties like Position and Length in a separate method. Those return the correct value. Further addition of logging within the same method shows that when this rare condition occurs, the Position only has zero inside this particular method.
Okay. Obviously, this must be a threading issue. And most likely a compiler optimization issue.
However, the nature of this software is that it's organized by "tasks" with a scheduler and so any one of several actual O/S thread may run this code at any give time--but never more than one at a time.
So it's my guess that ordinarily it so happens that the same thread keeps getting used for this method and then on a rare occasion a different thread get used. (Just code the idea to test this theory by capturing and comparing the thread id.)
Then due to compiler optimizations, the different thread never gets the correct value. It gets a "stale" value.
Ordinarily in a situation like this I would apply a "volatile" keyword to the variable in question to see if that fixes it. But in this case the variables are inside the MemoryStream object.
Does anyone have any other idea? Or does this mean we have to implement our own MemoryStream object?
Sincerely,
Wayne
EDIT: Just ran a test which counts the total number of calls to this method and counts the number of times the ManagedThreadId is different than the last call. It's almost exactly 50% of the time that it switches threads--alternating between them. So my theory above is almost certainly wrong or the error would occur far more often.
EDIT: This bug occurs so rarely that it would take nearly a week to run without the bug before feeling any confidence it's really gone. Instead, it's better to run experiments to confirm precisely the nature of the problem.
EDIT: Locking currently is handled via lock() statements in each of 5 methods that use the MemoryStream.

(Really need exemplar code to confirm this.)
MemoryStream members are not documented as thread safe (e.g. Position) so you need to ensure you are only access this instance (or any reference to an object logically a part of the MemoryStream) from one thread at a time.
But MemoryStream is not documented as having thread affinity, so you can access an instance from a different thread—as long as such an access is not concurrent.
Threading is hard (axiomatic for this Q&A).
I would suggest you have some concurrent access going on, with two threads both accessing the same instance concurrently and this is, occasionally, corrupting some aspect of the instance state.
I would ensure I keep the locking as simple as possible (trying to be extra clever and limiting locking is often a cause of very hard to find bugs) and get things working. Testing on a multi-core system may also help. Only try and optimise the locking if profiling shows there is potential for significant net (application as a whole) gain.

Related

Volatile.Write freshness guarentee

The documentation for Volatile.Write says the following:
Writes the specified object reference to the specified field. On
systems that require it, inserts a memory barrier that prevents the
processor from reordering memory operations as follows: If a read or
write appears before this method in the code, the processor cannot
move it after this method.
and
value T
The object reference to write. The reference is written
immediately so that it is visible to all processors in the computer.
But it seems like quotes 1 and 2 are contradictory.
For the second quote to be true, I would think that the first quote would have to be changed as follows:
If a read or
write appears before after this method in the code, the processor cannot
move it after before this method.
Does Volatile.Write actually mean that other threads are guaranteed to pick up the write in a timely fashion, or is the second quote misleading?
It seems to me as though all these "Volatile"/"Memory Barriers" seem to be focused on is ensuring that if writes are exposed to other threads they are exposed in the correct order, but I can't seem to find what actually would be force them to be exposed.
I understand that it may be hard/impossible to expose writes to other threads immediately, but without volatile writes/reads there are cases when the writes are exposed never. So it seems there must be a way to ensure that writes are exposed "eventually", but I'm unsure what that is. Is it that writes are always exposed in .NET but reads can be cached? And if so does Volatile.Read stop this caching behaviour?
(Note I have read through Joseph Albahari's Threading in C# which tends to suggest I need explicit memory barriers before my reads and after my writes, although it's not clear why even that should be effective as the documentation for Thread.MemoryBarrier doesn't seem to explicitly say that the writes are shown to other threads).

You are misunderstanding the concept of barriers a little bit. As you wrote
The object reference to write. The reference is written immediately so that it is visible to all processors in the computer.
So the really important unit here is a processor, not thread.
So, there are processors, processor caches, store buffers and invalidation queues involved.
When a processor writes something into the memory, it looks like that or similar to that
The subject is at the store buffer level. As you can see, there are a lot of things is going on when you write something or read, and it does not happen instantly for all the processors in the system. At the beginning a read or write command is places into processor's store buffer, and those commands could be reordered, in other words, executed in different order by the processor.
While that happens, other processors don't know about changes, if the operation is write and the currently working processor doesn't know about changes other processors made.
When you place a barrier, that means that operations in the store buffer or invalidation queue should be completed before any read or write could be performed. That is necessary to actualize CPU caches across processors. So there is basically no mechanics to synchronize any data across threads, we are syncing data across processors.
When a thread A writes something on processor 1 and thread B reads something on the processor 1, they both starts by looking into the store buffer first, so they read actual data, whether any barriers placed or not.
It's just an overview of the mechanic involved, maybe I'm wrong in some details. You can find complete info if you read about MESI protocol, this PDF with explanation on invalidation queues and store buffers

I agree with you that the description in the MSDN documentation is bit confusing. I would say that "immediately" is strong word here as well as in regards to any subject related to parallel processes. The result won't be visible immediately but documentation doesn't say that - it says that the value will be written immediately, that is as soon as all prior load/store operation results become globally visible the store operation to write a value will be immediately initiated.
As for the memory barriers, they only can give a guarantee of prior operations exposure (global visibility) because in essence the memory barriers are instructions which are encountered by a CPU make the CPU "wait" for getting all pending load/store operations globally visible while the moment of global visibility of value written by Volatile.Write is neither barrier nor Volatile.Write concern.
Now about suggestion to use the barrier in lock-free programming. Of course it makes sense because it ensures the order of global visibility which is actual for multi-core systems. When you cannot be sure that an event B always happens after event A you just can't build reliable logic supposed to be executed in multi-core environemnts.

what are the dangers of Thread.Sleep in Multi-Threaded code in production environment

I am dealing with Multi-Threaded code written by my predecessor within C# WinForm application that handles large volumes of data concurrently in production environment. I have identified that at some points within the code Thread.Sleep(20) is used. I am not very expert in multithreading and have basic knowledge of threading and synchronisation primitives. I need to know whether there are any dangers associated with Thread.Sleep

It's not going to be explicitly or directly dangerous. It's almost certainly wasting effort, as explicitly forcing your program to not do work when it has work to do is almost never sensible.
It's also a pretty significant red flag that there's a race condition in the code and, rather than actually figure out what it is or how to fix it, the programmer simply added in Sleep calls until he stopped seeing it. If true, it would mean that the program is still unstable and could potentially break at any time if enough other variables change, and that the issue should be actually fixed using proper synchronization techniques.

Whats the Best Concurrent Thread Shared Memory Architecture Without Locking?

I have a 2d Array of memory. I have multiple threads reading and writing to single elements in the array spontaneously, arbitrarily, and concurrently.
What is the fastest way or best practice to construct my memory access code? I don't like the idea of locking because it blocks other threads.
Data integrity is actually not that important, but it should be (mostly) consistent. My code can handle a few memory errors.
It needs to be really, really fast!
Thanks for feedback.

If data integrity is not important, you can just access the data without caring about multithreading at all.
No one can predict the result, though.
I wouldn't call this approach "best practice", however. IMHO best pratice is caring about multithreading, and protecting the data with appropriately-grained mutexes. My opinion is that every application should be first correct, and only then fast. Inconsistent results are just wrong, doesn't matter if they come fast or not.

Use the Interlocked class to CAS (CompareAndExchange) the objects/values in your array. It makes the operation atomic which ensures that the data is not corrupted. That's about the fastest thing you can do (aside from accessing/modifying the data directly without interlocking). However, if you're modifying the size of the 2D array (growing/shrinking) then you will have some serious problems unless you use some kind of locking mechanism on your array.

Declare the array as volatile and ensure it's scoped such that it's visible to all your threads. I generally like to avoid statics, so either pass the array by reference, or set up all your threads to run methods of an instance class that has the array defined as an instance field.
However, I strongly urge you to rethink what "volatile access" means in terms of data integrity. Best practice is NOT to do what you are attempting without good locking mechanics. You may think it's a small problem, but you can find yourself with a very non-deterministic system, so much so that its data isn't reliable in the slightest.
Let's say you have 8 threads running, and all of them will get a value from an index of the array, do some calculation, then add the result back to the index of the array. Thread 1 starts first and gets the value of the index, 0. Then threads 2-7 all start and get the same value. Thread 1 performs its calculation, gets the index again to ensure it has the "latest" value, then tries to update the value. However, other threads are waiting for that memory, and due to some scheduling implementation you know nothing about, in between Thread 1 getting the index (still zero) and writing its result, threads 2-7 have ALL written their values. Then Thread 1 writes its value, overwriting everything the other 7 threads have done. The other 7 threads, in turn, probably had similar "races" with each other such that the value overwritten by Thread 1 probably overwrote the results of half the threads anyway.
I guarantee you that this behavior is NOT what you want, no matter how much you think you can get away with it; it WILL cause data corruption, which WILL affect other areas of the system, and you WILL be forced to implement proper locking.

If you are interested solely in performance, then the way in which you order your memory accesses can play a big role. Spend an hour or so reading through the slides from Lecture 1 of MIT's Performance Engineering class. The other lectures may also be interesting to you (such as Lecture 6).
Basically, you can optimize your use of the cache to greatly improve performance, depending on your read/write patterns, given the workload you are using.
This should not stop you from doing something that is correct, however.

Can I avoid using locks for my seldomly-changing variable?

I've been reading Joe Duffy's book on Concurrent programming. I have kind of an academic question about lockless threading.
First: I know that lockless threading is fraught with peril (if you don't believe me, read the sections in the book about memory model)
Nevertheless, I have a question:
suppose I have an class with an int property on it.
The value referenced by this property will be read very frequently by multiple threads
It is extremely rare that the value will change, and when it does it will be a single thread that changes it.
If it does change while another operation that uses it is in flight, no one is going to lose a finger (the first thing anyone using it does is copy it to a local variable)
I could use locks (or a readerwriterlockslim to keep the reads concurrent).
I could mark the variable volatile (lots of examples where this is done)
However, even volatile can impose a performance hit.
What if I use VolatileWrite when it changes, and leave the access normal for reads. Something like this:
public class MyClass
{
private int _TheProperty;
internal int TheProperty
{
get { return _TheProperty; }
set { System.Threading.Thread.VolatileWrite(ref _TheProperty, value); }
}
}
I don't think that I would ever try this in real life, but I'm curious about the answer (more than anything, as a checkpoint of whether I understand the memory model stuff I've been reading).

Marking a variable as "volatile" has two effects.
1) Reads and writes have acquire and release semantics, so that reads and writes of other memory locations will not "move forwards and backwards in time" with respect to reads and writes of this memory location. (This is a simplification, but you take my point.)
2) The code generated by the jitter will not "cache" a value that seems to logically be unchanging.
Whether the former point is relevant in your scenario, I don't know; you've only described one memory location. Whether or not it is important that you have only volatile writes but not volatile reads is something that is up to you to decide.
But it seems to me that the latter point is quite relevant. If you have a spin lock on a non-volatile variable:
while(this.prop == 0) {}
the jitter is within its rights to generate this code as though you'd written
if (this.prop == 0) { while (true) {} }
Whether it actually does so or not, I don't know, but it has the right to. If what you want is for the code to actually re-check the property on each go round the loop, marking it as volatile is the right way to go.

The question is whether the reading thread will ever see the change. It's not just a matter of whether it sees it immediately.
Frankly I've given up on trying to understand volatility - I know it doesn't mean quite what I thought it used to... but I also know that with no kind of memory barrier on the reading thread, you could be reading the same old data forever.

The "performance hit" of volatile is because the compiler now generates code to actually check the value instead of optimizing that away - in other words, you'll have to take that performance hit regardless of what you do.

At the CPU level, yes every processor will eventually see the change to the memory address. Even without locks or memory barriers. Locks and barriers would just ensure that it all happened in a relative ordering (w.r.t other instructions) such that it appeared correct to your program.
The problem isn't cache-coherency (I hope Joe Duffy's book doesn't make that mistake). The caches stay conherent - it is just that this takes time, and the processors don't bother to wait for that to happen - unless you enforce it. So instead, the processor moves on to the next instruction, which may or may not end up happening before the previous one (because each memory read/write make take a different amount of time. Ironically because of the time for the processors to agree on coherency, etc. - this causes some cachelines to be conherent faster than others (ie depending on whether the line was Modified, Exclusive, Shared, or Invalid it takes more or less work to get into the necessary state).)
So a read may appear old or from an out of date cache, but really it just happened earlier than expected (typically because of look-ahead and branch prediction). When it really was read, the cache was coherent, it has just changed since then. So the value wasn't old when you read it, but it is now when you need it. You just read it too soon. :-(
Or equivalently, it was written later than the logic of your code thought it would be written.
Or both.
Anyhow, if this was C/C++, even without locks/barriers, you would eventually get the updated values. (within a few hundred cycles typically, as memory takes about that long). In C/C++ you could use volatile (the weak non-thread volatile) to ensure that the value wasn't read from a register. (Now there's a non-coherent cache! ie the registers)
In C# I don't know enough about CLR to know how long a value could stay in a register, nor how to ensure you get a real re-read from memory. You've lost the 'weak' volatile.
I would suspect as long as the variable access doesn't completely get compiled away, you will eventually run out of registers (x86 doesn't have many to start with) and get your re-read.
But no guarantees that I see. If you could limit your volatile-read to a particular point in your code that was often, but not too often (ie start of next task in a while(things_to_do) loop) then that might be the best you can do.

This is the pattern I use when the 'last writer wins' pattern is applicable to the situation. I had used the volatile keyword, but after seeing this pattern in a code example from Jeffery Richter, I started using it.

For normal things (like memory-mapped devices), the cache-coherency protocols going on within/between the CPU/CPUs is there to ensure that different threads sharing that memory get a consistent view of things (i.e., if I change the value of a memory location in one CPU, it will be seen by other CPUs that have the memory in their caches). In this regard volatile will help to ensure that the optimizer doesn't optimize away memory accesses (which are always going through cache anyway) by, say, reading the value cached in a register. The C# documentation seems pretty clear on this. Again, the application programmer doesn't generally have to deal with cache-coherency themselves.
I highly recommend reading the freely available paper "What Every Programmer Should Know About Memory". A lot of magic goes on under the hood that mostly prevents shooting oneself in the foot.

In C#, the int type is thread-safe.
Since you said that only one thread writes to it, you should never have contention as to what is the proper value, and as long as you are caching a local copy, you should never get dirty data.
You may, however, want to declare it volatile if an OS thread will be doing the update.
Also keep in mind that some operations are not atomic, and can cause problems if you have more than one writer. For example, even though the bool type wont corrupt if you have more than one writer, a statement like this:
a = !a;
is not atomic. If two threads read at the same time, you have a race condition.

Is there any circumstance in which calling EnterWriteLock on a ReaderWriterLockSlim should enter a Read lock instead?

I have a seemingly very simple case where I'm using System.Threading.ReaderWriterLockSlim in the 3.5 version of the .NET Framework. I first declare one, as shown here:
Lock Declaration http://odeh.temp.s3.amazonaws.com/lock_declaration.bmp
I put a break point right before the lock is acquired and took a screen shot so you can see (in the watch window) that there are currently no locks held:
pre lock acquisition http://odeh.temp.s3.amazonaws.com/prelock.bmp
Then, after calling EnterWriteLock, as you can see I am holding a Read Lock.
post lock acquisition http://odeh.temp.s3.amazonaws.com/postlock.bmp
This seems like truly unexpected behavior and I can't find it documented anywhere. Does anyone else know why this happens? In other places in my code (earlier), this exact same line of code correctly obtains a write lock. Consistently, however, across multiple systems it instead obtains a read lock at this place in the call stack. Hope I've made this clear and thanks for taking the time to look at this.
--- EDIT---
For those mentioning asserts... this just confuses me further:
post assert http://odeh.temp.s3.amazonaws.com/assert.bmp
I really can't say how it got past this assertion except that perhaps the Watch Window and the Immediate window are wrong (perhaps the value is stored thread locally, as another poster mentioned). This seems like an obvious case for a volatile variable and a Happens Before relationship to be established. Either way, several lines later there is code that asserts for a write lock and does not have one. I have set a break point on the only line of code in the entire program that releases this lock, and it doesn't get called after the acquisition shown here so that must mean it was never actually acquired... right?

This could be a debugger side-effect. The ReaderWriterLockSlim class is very sensitive to the current thread ID (Thread.ManagedThreadId). I can't state for a fact that the debugger will always use the current active thread to evaluate the watch expressions. It usually does, but there might be different behavior, say, if you entered the debugger with a hard break.
Trust what the code does first of all, your Debug.Assert proves the point.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.