Concurrency issues with Random in .Net? - c#

I've debugging some problem with a Paint.Net plugin and I've stumbled with some issue with the Random class, when several threads call a method from a single instance.
For some strange reason, it seems that if I do not prevent concurrent access, by synchronizing the called method, my Random instance starts to behave... randomly (but in the bad sense).
In the following example, I create several hundred threads that call repeteadly a single Random object. And when I run it, I sometimes (not always, but nearly) get clearly wrong results. The problem NEVER happens if I uncomment the Synchronized method annotation.
using System;
using System.Threading;
using System.Runtime.CompilerServices;
namespace testRandom {
class RandTest {
static int NTIMES = 300;
private long ac=0;
public void run() { // ask for random number 'ntimes' and accumulate
for(int i=0;i<NTIMES;i++) {
ac+=Program.getRandInt();
System.Threading.Thread.Sleep(2);
}
}
public double getAv() {
return ac/(double)NTIMES; // average
}
}
class Program
{
static Random random = new Random();
static int MAXVAL = 256;
static int NTREADS = 200;
//[MethodImpl(MethodImplOptions.Synchronized)]
public static int getRandInt() {
return random.Next(MAXVAL+1); // returns a value between 0 and MAXVAL (inclusive)
}
public static void Main(string[] args) {
RandTest[] tests = new RandTest[NTREADS];
Thread[] threads = new Thread[NTREADS];
for(int i=0;i<NTREADS;i++) {
tests[i]= new RandTest();
threads[i] = new Thread(new ThreadStart(tests[i].run));
}
for(int i=0;i<NTREADS;i++) threads[i].Start();
threads[0].Join();
bool alive=true;
while(alive) { // make sure threads are finished
alive = false;
for(int i=0;i<NTREADS;i++) { if(threads[i].IsAlive) alive=true; }
}
double av=0;
for(int i=0;i<NTREADS;i++) av += tests[i].getAv();
av /= NTREADS;
Console.WriteLine("Average:{0, 6:f2} Expected:{1, 6:f2}",av,MAXVAL/2.0);
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
}
}
}
An example ouput (with the above values) :
Average: 78.98 Expected:128.00
Press any key to continue . . .
Is this some known issue? Is it incorrect to call a Random object from several threads without sync?
UPDATE: As per answers, the docs state that Random methods are not thread safe - mea culpa, I should have read that. Perhaps I had read that before but didn't think it so important - one could (sloppily) think that, in the rare event of two threads entering the same method concurrently, the worst that could happen is that those calls get wrong results - not a huge deal, if we are not too concerned about random number quality... But the problem is really catastrophic, because the object is left in an inconsistent state, and from that on it returns keeps returning zero - as noted here.

For some strange reason
It's not really strange - Random is documented not to be thread-safe.
It's a pain, but that's life. See my article on Random for more information, and suggestions for how to have an instance per thread, with guards against starting with the same seed in multiple threads.

The Random class is not thread safe.
From the docs:
Any instance members are not guaranteed to be thread safe
Instead of synchronizing which will cause all the threads to block, try implementing the ThreadStatic attribute.

Random isn't guaranteed to be thread safe: http://msdn.microsoft.com/en-us/library/system.random.aspx unless it's public static.

Unfortunately this is correct, one has to be careful when using the Random Class.
Here are two blog posts with more details, comments and code samples on this topic:
Another issue with the .NET Random class
Humm … .NET Random class is not Thread Safe?
The worse part of this behaviour is that it just stops working (i.e. once the problem occurs the return value from the 'random.Next....' methods is 0)

Related

Multiple publishers sending concurrent messages to a single subscriber in Retlang?

I need to build an application where some number of instances of an object are generating "pulses", concurrently. (Essentially this just means that they are incrementing a counter.) I also need to track the total counters for each object. Also, whenever I perform a read on a counter, it needs to be reset to zero.
So I was talking to a guy at work, and he mentioned Retlang and message-based concurrency, which sounded super interesting. But obviously I am very new to the concept. So I've built a small prototype, and I get the expected results, which is awesome - but I'm not sure if I've potentially made some logical errors and left the software open to bugs, due to my inexperience with Retlang and concurrent programming in general.
First off, I have these classes:
public class Plc {
private readonly IChannel<Pulse> _channel;
private readonly IFiber _fiber;
private readonly int _pulseInterval;
private readonly int _plcId;
public Plc(IChannel<Pulse> channel, int plcId, int pulseInterval) {
_channel = channel;
_pulseInterval = pulseInterval;
_fiber = new PoolFiber();
_plcId = plcId;
}
public void Start() {
_fiber.Start();
// Not sure if it's safe to pass in a delegate which will run in an infinite loop...
// AND use a shared channel object...
_fiber.Enqueue(() => {
SendPulse();
});
}
private void SendPulse() {
while (true) {
// Not sure if it's safe to use the same channel object in different
// IFibers...
_channel.Publish(new Pulse() { PlcId = _plcId });
Thread.Sleep(_pulseInterval);
}
}
}
public class Pulse {
public int PlcId { get; set; }
}
The idea here is that I can instantiate multiple Plcs, pass each one the same IChannel, and then have them execute the SendPulse function concurrently, which would allow each one to publish to the same channel. But as you can see from my comments, I'm a little skeptical that what I'm doing is actually legit. I'm mostly worried about using the same IChannel object to Publish in the context of different IFibers, but I'm also worried about never returning from the delegate that was passed to Enqueue. I'm hoping some one can provide some insight as to how I should be handling this.
Also, here is the "subscriber" class:
public class PulseReceiver {
private int[] _pulseTotals;
private readonly IFiber _fiber;
private readonly IChannel<Pulse> _channel;
private object _pulseTotalsLock;
public PulseReceiver(IChannel<Pulse> channel, int numberOfPlcs) {
_pulseTotals = new int[numberOfPlcs];
_channel = channel;
_fiber = new PoolFiber();
_pulseTotalsLock = new object();
}
public void Start() {
_fiber.Start();
_channel.Subscribe(_fiber, this.UpdatePulseTotals);
}
private void UpdatePulseTotals(Pulse pulse) {
// This occurs in the execution context of the IFiber.
// If we were just dealing with the the published Pulses from the channel, I think
// we wouldn't need the lock, since I THINK the published messages would be taken
// from a queue (i.e. each Plc is publishing concurrently, but Retlang enqueues
// the messages).
lock(_pulseTotalsLock) {
_pulseTotals[pulse.PlcId - 1]++;
}
}
public int GetTotalForPlc(int plcId) {
// However, this access takes place in the application thread, not in the IFiber,
// and I think there could potentially be a race condition here. I.e. the array
// is being updated from the IFiber, but I think I'm reading from it and resetting values
// concurrently in a different thread.
lock(_pulseTotalsLock) {
if (plcId <= _pulseTotals.Length) {
int currentTotal = _pulseTotals[plcId - 1];
_pulseTotals[plcId - 1] = 0;
return currentTotal;
}
}
return -1;
}
}
So here, I am reusing the same IChannel that was given to the Plc instances, but having a different IFiber subscribe to it. Ideally then I could receive the messages from each Plc, and update a single private field within my class, but in a thread safe way.
From what I understand (and I mentioned in my comments), I think that I would be safe to simply update the _pulseTotals array in the delegate which I gave to the Subscribe function, because I would receive each message from the Plcs serially.
However, I'm not sure how best to handle the bit where I need to read the totals and reset them. As you can see from the code and comments, I ended up wrapping a lock around any access to the _pulseTotals array. But I'm not sure if this is necessary, and I would love to know a) if it is in fact necessary to do this, and why, or b) the correct way to implement something similar.
And finally for good measure, here's my main function:
static void Main(string[] args) {
Channel<Pulse> pulseChannel = new Channel<Pulse>();
PulseReceiver pulseReceiver = new PulseReceiver(pulseChannel, 3);
pulseReceiver.Start();
List<Plc> plcs = new List<Plc>() {
new Plc(pulseChannel, 1, 500),
new Plc(pulseChannel, 2, 250),
new Plc(pulseChannel, 3, 1000)
};
plcs.ForEach(plc => plc.Start());
while (true) {
Thread.Sleep(10000);
Console.WriteLine(string.Format("Plc 1: {0}\nPlc 2: {1}\nPlc 3: {2}\n", pulseReceiver.GetTotalForPlc(1), pulseReceiver.GetTotalForPlc(2), pulseReceiver.GetTotalForPlc(3)));
}
}
I instantiate one single IChannel, pass it to everything, where internally the Receiver subscribes with an IFiber, and where the Plcs use IFibers to "enqueue" a non-returning method which continually publishes to the channel.
Again, the console output looks exactly like I would expect it to look, i.e. I see 20 "pulses" for Plc 1 after waiting 10 seconds. And the resetting of the counters after a read also seems to work, i.e. Plc 1 has 20 "pulses" after each 10 second increment. But that doesn't reassure me that I haven't overlooked something important.
I'm really excited to learn a bit more about Retlang and concurrent programming techniques, so hopefuly someone has the time to sift through my code and offer some suggestions for my specific concerns, or else even a different design based on my requirements!

static method vs instance method, multi threading, performance

Can you help explain how multiple threads access static methods? Are multiple threads able to access the static method concurrently?
To me it would seem logical that if a method is static that would make it a single resouce that is shared by all the threads. Therefore only one thread would be able to use it at a time. I have created a console app to test this. But from the results of my test it would appear that my assumption is incorrect.
In my test a number of Worker objects are constructed. Each Worker has a number of passwords and keys. Each Worker has an instance method that hashes it's passwords with it's keys. There is also a static method which has exactly the same implementation, the only difference being that it is static. After all the Worker objects have been created the start time is written to the console. Then a DoInstanceWork event is raised and all of the Worker objects queue their useInstanceMethod to the threadpool. When all the methods or all the Worker objects have completed the time it took for them all to complete is calculated from the start time and is written to the console. Then the start time is set to the current time and the DoStaticWork event is raised. This time all the Worker objects queue their useStaticMethod to the threadpool. And when all these method calls have completed the time it took until they had all completed is again calculated and written to the console.
I was expecting the time taken when the objects use their instance method to be 1/8 of the time taken when they use the static method. 1/8 because my machine has 4 cores and 8 virtual threads. But it wasn't. In fact the time taken when using the static method was actually fractionally faster.
How is this so? What is happening under the hood? Does each thread get it's own copy of the static method?
Here is the Console app-
using System;
using System.Collections.Generic;
using System.Security.Cryptography;
using System.Threading;
namespace bottleneckTest
{
public delegate void workDelegate();
class Program
{
static int num = 1024;
public static DateTime start;
static int complete = 0;
public static event workDelegate DoInstanceWork;
public static event workDelegate DoStaticWork;
static bool flag = false;
static void Main(string[] args)
{
List<Worker> workers = new List<Worker>();
for( int i = 0; i < num; i++){
workers.Add(new Worker(i, num));
}
start = DateTime.UtcNow;
Console.WriteLine(start.ToString());
DoInstanceWork();
Console.ReadLine();
}
public static void Timer()
{
complete++;
if (complete == num)
{
TimeSpan duration = DateTime.UtcNow - Program.start;
Console.WriteLine("Duration: {0}", duration.ToString());
complete = 0;
if (!flag)
{
flag = true;
Program.start = DateTime.UtcNow;
DoStaticWork();
}
}
}
}
public class Worker
{
int _id;
int _num;
KeyedHashAlgorithm hashAlgorithm;
int keyLength;
Random random;
List<byte[]> _passwords;
List<byte[]> _keys;
List<byte[]> hashes;
public Worker(int id, int num)
{
this._id = id;
this._num = num;
hashAlgorithm = KeyedHashAlgorithm.Create("HMACSHA256");
keyLength = hashAlgorithm.Key.Length;
random = new Random();
_passwords = new List<byte[]>();
_keys = new List<byte[]>();
hashes = new List<byte[]>();
for (int i = 0; i < num; i++)
{
byte[] key = new byte[keyLength];
new RNGCryptoServiceProvider().GetBytes(key);
_keys.Add(key);
int passwordLength = random.Next(8, 20);
byte[] password = new byte[passwordLength * 2];
random.NextBytes(password);
_passwords.Add(password);
}
Program.DoInstanceWork += new workDelegate(doInstanceWork);
Program.DoStaticWork += new workDelegate(doStaticWork);
}
public void doInstanceWork()
{
ThreadPool.QueueUserWorkItem(useInstanceMethod, new WorkerArgs() { num = _num, keys = _keys, passwords = _passwords });
}
public void doStaticWork()
{
ThreadPool.QueueUserWorkItem(useStaticMethod, new WorkerArgs() { num = _num, keys = _keys, passwords = _passwords });
}
public void useInstanceMethod(object args)
{
WorkerArgs workerArgs = (WorkerArgs)args;
for (int i = 0; i < workerArgs.num; i++)
{
KeyedHashAlgorithm hashAlgorithm = KeyedHashAlgorithm.Create("HMACSHA256");
hashAlgorithm.Key = workerArgs.keys[i];
byte[] hash = hashAlgorithm.ComputeHash(workerArgs.passwords[i]);
}
Program.Timer();
}
public static void useStaticMethod(object args)
{
WorkerArgs workerArgs = (WorkerArgs)args;
for (int i = 0; i < workerArgs.num; i++)
{
KeyedHashAlgorithm hashAlgorithm = KeyedHashAlgorithm.Create("HMACSHA256");
hashAlgorithm.Key = workerArgs.keys[i];
byte[] hash = hashAlgorithm.ComputeHash(workerArgs.passwords[i]);
}
Program.Timer();
}
public class WorkerArgs
{
public int num;
public List<byte[]> passwords;
public List<byte[]> keys;
}
}
}
Methods are code - there's no problem with thread accessing that code concurrently since the code isn't modified by running it; it's a read-only resource (jitter aside). What needs to be handled carefully in multi-threaded situations is access to data concurrently (and more specifically, when modifying that data is a possibility). Whether a method is static or an instance method has nothing to do with whether or not it needs to ne serialized in some way to make it threadsafe.
In all cases, whether static or instance, any thread can access any method at any time unless you do explicit work to prevent it.
For example, you can create a lock to ensure only a single thread can access a given method, but C# will not do that for you.
Think of it like watching TV. A TV does nothing to prevent multiple people from watching it at the same time, and as long as everybody watching it wants to see the same show, there's no problem. You certainly wouldn't want a TV to only allow one person to watch it at once just because multiple people might want to watch different shows, right? So if people want to watch different shows, they need some sort of mechanism external to the TV itself (perhaps having a single remote control that the current viewer holds onto for the duration of his show) to make sure that one guy doesn't change the channel to his show while another guy is watching.
C# methods are "reentrant" (As in most languages; the last time I heard of genuinely non-reentrant code was DOS routines) Each thread has its own call stack, and when a method is called, the call stack of that thread is updated to have space for the return address, calling parameters, return value, local values, etc.
Suppose Thread1 and Thread2 calls the method M concurrently and M has a local int variable n. The call stack of Thread1 is seperate from the call stack of Thread2, so n will have two different instantiations in two different stacks. Concurrency would be a problem only if n is stored not in a stack but say in the same register (i.e. in a shared resource) CLR (or is it Windows?) is careful not to let that cause a problem and cleans, stores and restores the registers when switching threads. (What do you do in presence of multiple CPU's, how do you allocate registers, how do you implement locking. These are indeed difficult problems that makes one respect compiler, OS writers when one comes to think of it)
Being reentrant does not prove no bad things happen when two threads call the same method at the same time: it only proves no bad things happen if the method does not access and update other shared resources.
When you access an instance method, you are accessing it through an object reference.
When you access a static method, you are accessing it directly.
So static methods are a tiny bit faster.
When you instanciate a class you dont create a copy of the code. You have a pointer to the definition of the class, and the code is acceded through it. So, instance methods are accessed the sane way than static methods

Do we need to assign received parameters to local parameteres at public static functions at asp.net 4.0

For being thread safe do we need to assign function parameters to the local variables. I am going to explain it with an example
public static bool CheckEmailExist_1(string srEmail)
{
//Do some stuff with using srEmail
}
public static bool CheckEmailExist_2(string Email)
{
string srEmail=Email;
//Do some stuff with using srEmail
}
Are there any thread safe difference when these 2 functions evaluated. I mean for example lets say CheckEmailExist_1 got 100 concurrent call with of course different email parameters. Would that cause any problem during the function inside operations ?
c# 4.0 , asp.net 4.0
Thank you
Local variables won't have issues with thread safety. Thread safety becomes a concern when there is shared state between multiple threads. In that function since you are passing the result By Value (which is the default for primitive types such as string, integers, decimals) thread safety is not an issue since there is no shared state. Objects on the other hand are passed By Reference and thread safety might becomes an issue.
Here is a classic example. The value of _unsafe should be 10000 since you have 100 threads incrementing the _unsafe variable 100 times but it may not be when you run the program. This is because the value may be read by one thread that performs it's calculations and then while it's performing it's calculations the value of the variable would be incremented by another thread. This is called a race condition and is something to avoid. Here is a great ebook on threading that covers all the topics you need to know.
http://www.albahari.com/threading/
public class TestThreading(){
private static int _unsafe = 0;
public static void main(string[] args){
for(int i =0; i<100;i++){
ThreadPool.QueueUserWorkItem(PerformIncrement);
}
}
public static void PerformIncrement(){
for(int i=0;i <100;i++){
_unsafe++;
}
}
}
Here is another unsafe example using objects. This has the same problem as the previous example since there are multiple thread working on the same piece of data (in this case it's the class variable "unsafe")
public class TestThreading2(){
public int unsafe = 0;
public static void main(string[] args){
TestThreading2 objectUnsafe = new TestThreading2();
for(int i=0;i <100;i++){
Thread t = new Thread (PerformIncrement);
t.Start (objectUnsafe);
}
}
public static void PerformIncrement(object referenceParameter){
var objectReference = (TestThreading2) referenceParameter;
for(int i=0;i <100;i++){
objectReference.unsafe++;
}
}
}
No, there is no need to assign the parameters of a static method.

How to restrict number of concurrent processes?

My situation is as follow:
I have an application that can be started only a fixed number of times (less than 50).
A separate central process to manage other processes is not allowed due to business requirement. (ie. if a nice solution that involves ONLY the application processes is still acceptable)
I am using C# for developing the application and therefore managed solution is preferred.
I have to deal with "unexpected" cases such as the processes can be terminated by using TaskManager.
I am thinking of solution that make use of system-wide mutex. However, it doesn't survive the "Unexpected" cases very well in the way that it leaves "abandoned" mutex. If this is a good way, may I ask what is the catch of "ignoring" the mutex abandoned?
One approach would be to query for the process list and count the number of instances currently alive. Another approach, more comples, would be to broadcast UDP and count the number of responses. I have used this pattern for distributed scenarios related to job processors.
HTH
Colby Africa
You could use a shared memory segment and increment a count each time an application is opened, and decrement when the application is closed. A more simple approach may be to use an interprocess semaphore which you alluded to in your question.
When a process is terminated via an "unexpected" event such as task manager process killing it should throw a ThreadAbortException. You should really try to wrap your mutex holding in some sort of try / finally that will allow you to release it as the thread is aborting.
I'm not 100% sure that's true but there aught to be some way to respond to situations like that.
Expanding on the Process List approach, using WMI.NET with C# may look like this:
using System;
using System.Collections.Generic;
using System.Collections.Specialized;
using System.Text;
using System.Management;
namespace WmiProc
{
class Program
{
static void Main(string[] args)
{
ManagementScope ms = new System.Management.ManagementScope(
#"\\myserver\root\cimv2");
var oq = new System.Management.ObjectQuery(
"SELECT * FROM Win32_Process where Name='myprocname'");
ManagementObjectSearcher query1 = new ManagementObjectSearcher(ms, oq);
ManagementObjectCollection procsCollection = query1.Get();
Console.WriteLine("process count:{0}", procsCollection.Count);
}
}
}
EDIT: There will be some separation of starting times, such that letting too many processes to run at once is not likely. You'll have to test for specific behavior in your environment.
Maybe you can periodically check the process count from a separate (long running) process and terminate excess processes according to some criterion (e.g. newest).
Well, you could work with named Mutex-instances.
Use a personal naming-scheme for Mutexes, request this name and check the result of a mutex with this name already was created.
If you use a naming scheme with an incremental element, you can try all mutex-names ascending you incremental element, and count like this, how many mutex were created.
Needs some improvement on handling with released mutexes still, but that seems trivial.
class Program
{
private static Mutex mutex = null;
static void Main(string[] args)
{
AppDomain.CurrentDomain.ProcessExit += new EventHandler(CurrentDomain_ProcessExit);
int count = Program.CheckInstanceCount();
Console.WriteLine("This is instance {0} running.", count);
Console.Read();
}
static void CurrentDomain_ProcessExit(object sender, EventArgs e)
{
Program.mutex.ReleaseMutex();
Program.mutex.Close();
}
private static int CheckInstanceCount()
{
int result = 0;
bool created = false;
for (int i = 0; i < 50; i++)
{
/* try to create a mutex with this name,
* if it does exist, another instance
* of this program is running
*/
mutex = new Mutex(true, string.Concat(AppDomain.CurrentDomain.FriendlyName, i.ToString()), out created);
if (created)
{
// this instance is instance #i currently running
result = i;
break;
}
}
return result;
}
}
I couldn't add comments to an above answer, but from reading the above answers, and comments, it seems like you should be able to combine a mutex with the Process Instance check.
// You can use any System wide mutual exclusion mechanism here
bool waitAndLockMutex();
void unlockMutex();
// returns the number of processes who use the specified command
int getProcessCount();
void main() {
try {
waitAndLockMutex();
if (getProcessCount() > MAX_ALLOWED)
return;
doUsualWork();
} finally {
unlockMutex();
}
}
Note that the above code is simply for illustrative purposes, and the body for the declared functions calls can be easily written using .NET
EDIT:
If you do not want to go the route of counting the processes of interest, you can use global mutex for it. Not sure if .NET exposes that. But the gist is that you can acquire all the mutexes till the MAX, and in the process if you get a Mutex that has not yet been created or is ABANDONED, then you go ahead and let the process launch, else exit saying exceeding max count
void main() {
for (int i = 0; i < MAX; ++i) {
int status = TryToAcquireMutex("mutex" + i);
continue if (status == locked);
if (status == success || status == WAIT_ABANDONED) {
doUsusalWork();
}
}
}

Finalizer launched while its object was still being used

Summary: C#/.NET is supposed to be garbage collected. C# has a destructor, used to clean resources. What happen when an object A is garbage collected the same line I try to clone one of its variable members? Apparently, on multiprocessors, sometimes, the garbage collector wins...
The problem
Today, on a training session on C#, the teacher showed us some code which contained a bug only when run on multiprocessors.
I'll summarize to say that sometimes, the compiler or the JIT screws up by calling the finalizer of a C# class object before returning from its called method.
The full code, given in Visual C++ 2005 documentation, will be posted as an "answer" to avoid making a very very large questions, but the essential are below:
The following class has a "Hash" property which will return a cloned copy of an internal array. At is construction, the first item of the array has a value of 2. In the destructor, its value is set to zero.
The point is: If you try to get the "Hash" property of "Example", you'll get a clean copy of the array, whose first item is still 2, as the object is being used (and as such, not being garbage collected/finalized):
public class Example
{
private int nValue;
public int N { get { return nValue; } }
// The Hash property is slower because it clones an array. When
// KeepAlive is not used, the finalizer sometimes runs before
// the Hash property value is read.
private byte[] hashValue;
public byte[] Hash { get { return (byte[])hashValue.Clone(); } }
public Example()
{
nValue = 2;
hashValue = new byte[20];
hashValue[0] = 2;
}
~Example()
{
nValue = 0;
if (hashValue != null)
{
Array.Clear(hashValue, 0, hashValue.Length);
}
}
}
But nothing is so simple...
The code using this class is wokring inside a thread, and of course, for the test, the app is heavily multithreaded:
public static void Main(string[] args)
{
Thread t = new Thread(new ThreadStart(ThreadProc));
t.Start();
t.Join();
}
private static void ThreadProc()
{
// running is a boolean which is always true until
// the user press ENTER
while (running) DoWork();
}
The DoWork static method is the code where the problem happens:
private static void DoWork()
{
Example ex = new Example();
byte[] res = ex.Hash; // [1]
// If the finalizer runs before the call to the Hash
// property completes, the hashValue array might be
// cleared before the property value is read. The
// following test detects that.
if (res[0] != 2)
{
// Oops... The finalizer of ex was launched before
// the Hash method/property completed
}
}
Once every 1,000,000 excutions of DoWork, apparently, the Garbage Collector does its magic, and tries to reclaim "ex", as it is not anymore referenced in the remaning code of the function, and this time, it is faster than the "Hash" get method. So what we have in the end is a clone of a zero-ed byte array, instead of having the right one (with the 1st item at 2).
My guess is that there is inlining of the code, which essentially replaces the line marked [1] in the DoWork function by something like:
// Supposed inlined processing
byte[] res2 = ex.Hash2;
// note that after this line, "ex" could be garbage collected,
// but not res2
byte[] res = (byte[])res2.Clone();
If we supposed Hash2 is a simple accessor coded like:
// Hash2 code:
public byte[] Hash2 { get { return (byte[])hashValue; } }
So, the question is: Is this supposed to work that way in C#/.NET, or could this be considered as a bug of either the compiler of the JIT?
edit
See Chris Brumme's and Chris Lyons' blogs for an explanation.
http://blogs.msdn.com/cbrumme/archive/2003/04/19/51365.aspx
http://blogs.msdn.com/clyon/archive/2004/09/21/232445.aspx
Everyone's answer was interesting, but I couldn't choose one better than the other. So I gave you all a +1...
Sorry
:-)
Edit 2
I was unable to reproduce the problem on Linux/Ubuntu/Mono, despite using the same code on the same conditions (multiple same executable running simultaneously, release mode, etc.)
It's simply a bug in your code: finalizers should not be accessing managed objects.
The only reason to implement a finalizer is to release unmanaged resources. And in this case, you should carefully implement the standard IDisposable pattern.
With this pattern, you implement a protected method "protected Dispose(bool disposing)". When this method is called from the finalizer, it cleans up unmanaged resources, but does not attempt to clean up managed resources.
In your example, you don't have any unmanaged resources, so should not be implementing a finalizer.
What you're seeing is perfectly natural.
You don't keep a reference to the object that owns the byte array, so that object (not the byte array) is actually free for the garbage collector to collect.
The garbage collector really can be that aggressive.
So if you call a method on your object, which returns a reference to an internal data structure, and the finalizer for your object mess up that data structure, you need to keep a live reference to the object as well.
The garbage collector sees that the ex variable isn't used in that method any more, so it can, and as you notice, will garbage collect it under the right circumstances (ie. timing and need).
The correct way to do this is to call GC.KeepAlive on ex, so add this line of code to the bottom of your method, and all should be well:
GC.KeepAlive(ex);
I learned about this aggressive behavior by reading the book Applied .NET Framework Programming by Jeffrey Richter.
this looks like a race condition between your work thread and the GC thread(s); to avoid it, i think there are two options:
(1) change your if statement to use ex.Hash[0] instead of res, so that ex cannot be GC'd prematurely, or
(2) lock ex for the duration of the call to Hash
that's a pretty spiffy example - was the teacher's point that there may be a bug in the JIT compiler that only manifests on multicore systems, or that this kind of coding can have subtle race conditions with garbage collection?
I think what you are seeing is reasonable behavior due to the fact that things are running on multiple threads. This is the reason for the GC.KeepAlive() method, which should be used in this case to tell the GC that the object is still being used and that it isn't a candidate for cleanup.
Looking at the DoWork function in your "full code" response, the problem is that immediately after this line of code:
byte[] res = ex.Hash;
the function no longer makes any references to the ex object, so it becomes eligible for garbage collection at that point. Adding the call to GC.KeepAlive would prevent this from happening.
Yes, this is an issue that has come up before.
Its even more fun in that you need to run release for this to happen and you end up stratching your head going 'huh, how can that be null?'.
Interesting comment from Chris Brumme's blog
http://blogs.msdn.com/cbrumme/archive/2003/04/19/51365.aspx
class C {<br>
IntPtr _handle;
Static void OperateOnHandle(IntPtr h) { ... }
void m() {
OperateOnHandle(_handle);
...
}
...
}
class Other {
void work() {
if (something) {
C aC = new C();
aC.m();
... // most guess here
} else {
...
}
}
}
So we can’t say how long ‘aC’ might live in the above code. The JIT might report the reference until Other.work() completes. It might inline Other.work() into some other method, and report aC even longer. Even if you add “aC = null;” after your usage of it, the JIT is free to consider this assignment to be dead code and eliminate it. Regardless of when the JIT stops reporting the reference, the GC might not get around to collecting it for some time.
It’s more interesting to worry about the earliest point that aC could be collected. If you are like most people, you’ll guess that the soonest aC becomes eligible for collection is at the closing brace of Other.work()’s “if” clause, where I’ve added the comment. In fact, braces don’t exist in the IL. They are a syntactic contract between you and your language compiler. Other.work() is free to stop reporting aC as soon as it has initiated the call to aC.m().
That's perfectly nornal for the finalizer to be called in your do work method as after the
ex.Hash call, the CLR knows that the ex instance won't be needed anymore...
Now, if you want to keep the instance alive do this:
private static void DoWork()
{
Example ex = new Example();
byte[] res = ex.Hash; // [1]
// If the finalizer runs before the call to the Hash
// property completes, the hashValue array might be
// cleared before the property value is read. The
// following test detects that.
if (res[0] != 2) // NOTE
{
// Oops... The finalizer of ex was launched before
// the Hash method/property completed
}
GC.KeepAlive(ex); // keep our instance alive in case we need it.. uh.. we don't
}
GC.KeepAlive does... nothing :) it's an empty not inlinable /jittable method whose only purpose is to trick the GC into thinking the object will be used after this.
WARNING: Your example is perfectly valid if the DoWork method were a managed C++ method... You DO have to manually keep the managed instances alive manually if you don't want the destructor to be called from within another thread. IE. you pass a reference to a managed object who is going to delete a blob of unmanaged memory when finalized, and the method is using this same blob. If you don't hold the instance alive, you're going to have a race condition between the GC and your method's thread.
And this will end up in tears. And managed heap corruption...
The Full Code
You'll find below the full code, copy/pasted from a Visual C++ 2008 .cs file. As I'm now on Linux, and without any Mono compiler or knowledge about its use, there's no way I can do tests now. Still, a couple of hours ago, I saw this code work and its bug:
using System;
using System.Threading;
public class Example
{
private int nValue;
public int N { get { return nValue; } }
// The Hash property is slower because it clones an array. When
// KeepAlive is not used, the finalizer sometimes runs before
// the Hash property value is read.
private byte[] hashValue;
public byte[] Hash { get { return (byte[])hashValue.Clone(); } }
public byte[] Hash2 { get { return (byte[])hashValue; } }
public int returnNothing() { return 25; }
public Example()
{
nValue = 2;
hashValue = new byte[20];
hashValue[0] = 2;
}
~Example()
{
nValue = 0;
if (hashValue != null)
{
Array.Clear(hashValue, 0, hashValue.Length);
}
}
}
public class Test
{
private static int totalCount = 0;
private static int finalizerFirstCount = 0;
// This variable controls the thread that runs the demo.
private static bool running = true;
// In order to demonstrate the finalizer running first, the
// DoWork method must create an Example object and invoke its
// Hash property. If there are no other calls to members of
// the Example object in DoWork, garbage collection reclaims
// the Example object aggressively. Sometimes this means that
// the finalizer runs before the call to the Hash property
// completes.
private static void DoWork()
{
totalCount++;
// Create an Example object and save the value of the
// Hash property. There are no more calls to members of
// the object in the DoWork method, so it is available
// for aggressive garbage collection.
Example ex = new Example();
// Normal processing
byte[] res = ex.Hash;
// Supposed inlined processing
//byte[] res2 = ex.Hash2;
//byte[] res = (byte[])res2.Clone();
// successful try to keep reference alive
//ex.returnNothing();
// Failed try to keep reference alive
//ex = null;
// If the finalizer runs before the call to the Hash
// property completes, the hashValue array might be
// cleared before the property value is read. The
// following test detects that.
if (res[0] != 2)
{
finalizerFirstCount++;
Console.WriteLine("The finalizer ran first at {0} iterations.", totalCount);
}
//GC.KeepAlive(ex);
}
public static void Main(string[] args)
{
Console.WriteLine("Test:");
// Create a thread to run the test.
Thread t = new Thread(new ThreadStart(ThreadProc));
t.Start();
// The thread runs until Enter is pressed.
Console.WriteLine("Press Enter to stop the program.");
Console.ReadLine();
running = false;
// Wait for the thread to end.
t.Join();
Console.WriteLine("{0} iterations total; the finalizer ran first {1} times.", totalCount, finalizerFirstCount);
}
private static void ThreadProc()
{
while (running) DoWork();
}
}
For those interested, I can send the zipped project through email.

Categories

Resources