In the following code, why we don't get NullReference exception and var2 value is 56 though the TestMethod has certainly finished before 'Messagebox' line?
I read this great answer from Eric Lippert and this blog post, but I still don't get it.
void TestMethod()
{
int var1 = 10;
List<long> list1 = new List<long>();
for (int i = 0; i < 5; i++)
list1.Add(i);
ThreadPool.QueueUserWorkItem(delegate
{
int var2 = var1;
Thread.Sleep(1000);
list1.Clear();
MessageBox.Show(var2.ToString());
});
var1 = 56;
}
I think it's because delegate has formed closure around variable var1. Probably looking at how closure works internally would help you. You can refer to explanation here
The compiler (as opposed to the runtime) creates another class/type.
The function with your closure and any variables you closed
over/hoisted/captured are re-written throughout your code as members
of that class. A closure in .Net is implemented as one instance of
this hidden class.
Having that that, I believe roughly compiler generated code would look like :
void TestMethod()
{
UnspeackableClosureClass closure = new UnspeackableClosureClass(10);
List<long> list1 = new List<long>();
for (int i = 0; i < 5; i++)
list1.Add(i);
ThreadPool.QueueUserWorkItem(closure.AutoGeneratedMethod);
closure.closureVar = 56;
}
public class UnspeackableClosureClass
{
public int closureVar;
public UnspeackableClosureClass(int val){closureVar=val}
public void AutoGeneratedMethod(){
int var2 = closureVar;
Thread.Sleep(1000);
list1.Clear();
MessageBox.Show(var2.ToString());
}
}
I think what you're saying is that you expect var1 to be deallocated when TestMethod() exits. After all, the local variables are stored on the stack, and when the method exits, the stack pointer has to revert to the spot where it was before the call, meaning that all the local variables are deallocated. If that were really what were happening, var1 might not be set to null at all; it could contain garbage, or bits of some other local variable, created later when the stack pointer moves again. Is that what you mean?
What turned the light on for me is the understanding that asynchronous thinking is not stack-based at all. A stack just doesn't work-- because the order of calls do not form a stack. Instead, bits of code are associated with contextual objects which are held on the heap. They can execute in any order and even simultaneously.
Your delegate needs var1, so the compiler promotes it from a variable held in the stack to a variable held in one of these objects, associated with the delegate's behavior. This is what is called a "closure" or a "closed variable." To the delegate, it looks just like a local variable, because it is-- just not on the stack any more. It will live as long as that object needs to live, even after TestMethod() has exited.
I was wondering why use a ThreadStaticAttributed field over a local variable. I don't see any difference between a local variable and a ThreadStaticField. Here is some code to furthermore underline my point:
static void Main()
{
Thread t1 = new Thread(doSomething);
t1.Start();
}
[ThreadStaticAttribute]
static int secondNumber;
static void doSomething()
{
int number = 3;
secondNumber = 7;
Console.WriteLine(number); //Compiles to 3
Console.WriteLine(secondNumber); //Compiles to 7
}
Now the following code will have the same result as the above:
static void Main()
{
Thread t1 = new Thread(doSomething);
t1.Start();
}
static void doSomething()
{
int number = 3;
int secondNumber = 7;
Console.WriteLine(number); //Compiles to 3
Console.WriteLine(secondNumber); //Compiles to 7
}
So what is the use of a [ThreadStaticAttribute] Field, if I can just as good use a local Variable in the Method?
Scope and lifetime.
Scope: public thread-static fields are accessible from anywhere (other methods, other classes). Locals are only accessible in their immediate scope (this means a function cannot allocate-and-return data on the stack).
Lifetime: Locals only last as long as their stackframe lives. Thread-static values can last for as long as the life of the process.
There is a workaround: if you need thread-local storage shared by different methods in a long call-chain you can use an object (containing many properties) passed by-ref into the different methods - though you'll run into safety problems if you spawn new threads accessing the same object), e.g.:
class ThreadLocalValues {
public Int32 SomeValue1;
public String SomeValue2;
}
void Foo(ThreadLocalValues context) {
context.SomeValue1 = 1;
SomeOtherMethod( context, otherStuff, goesHere );
}
Instead of:
[ThreadStatic]
public static Int32 SomeValue1
[ThreadStatic]
public static String SomeValue2
Why use a ThreadStaticAttribute field over a local Variable in the invoked
Method?
Because a local variable is not accessible outside of the method.
A lot of variables you access are thread static if you do web development. HttpContext.CURRENT - is different for every thread, for example.
And that is exactly the use case - sometimes you need to make some data available to third parties, but this - particularly in a server environment - often happens on a per thread scenario. You COULD put it in as parameter into a method call, but this would require a lot of parameters passed possibly through a lot of methods, so for certain things a ThreadStatic variable is better.
The first example shows a variable, which is a single instance static to all threads. Other threads can access the variable's value.
The second example gives each thread its own local variable. Other threads wouldn't be able to see the local variable.
I'm trying to use this great project but since i need to scan many images the process takes a lot of time so i was thinking about multi-threading it.
However, since the class that makes the actual processing of the images uses Static methods and is manipulating Objects by ref i'm not really sure how to do it right. the method that I call from my main Thread is:
public static void ScanPage(ref System.Collections.ArrayList CodesRead, Bitmap bmp, int numscans, ScanDirection direction, BarcodeType types)
{
//added only the signature, actual class has over 1000 rows
//inside this function there are calls to other
//static functions that makes some image processing
}
My question is if it's safe to use use this function like this:
List<string> filePaths = new List<string>();
Parallel.For(0, filePaths.Count, a =>
{
ArrayList al = new ArrayList();
BarcodeImaging.ScanPage(ref al, ...);
});
I've spent hours debugging it and most of the time the results i got were correct but i did encounter several errors which i now can't seem to reproduce.
EDIT
I pasted the code of the class to here: http://pastebin.com/UeE6qBHx
I'm pretty sure it is thread safe.
There are two fields, which are configuration fields and are not modified inside the class.
So basically this class has no state and all calculation has no side effects
(Unless I don't see something very obscure).
Ref modifier is not needed here, because the reference is not modified.
There's no way of telling unless you know if it stores values in local variables or in a field (in the static class, not the method).
All local variables will be fine and instanced per call, but the fields will not.
A very bad example:
public static class TestClass
{
public static double Data;
public static string StringData = "";
// Can, and will quite often, return wrong values.
// for example returning the result of f(8) instead of f(5)
// if Data is changed before StringData is calculated.
public static string ChangeStaticVariables(int x)
{
Data = Math.Sqrt(x) + Math.Sqrt(x);
StringData = Data.ToString("0.000");
return StringData;
}
// Won't return the wrong values, as the variables
// can't be changed by other threads.
public static string NonStaticVariables(int x)
{
var tData = Math.Sqrt(x) + Math.Sqrt(x);
return Data.ToString("0.000");
}
}
Suppose i am having a class
Class ABC
{
public string Method1()
{
return "a";
}
public string Method2()
{
return "b";
}
public string Method3()
{
return "c";
}
}
and Now i am calling this methods in two ways like :
ABC obj=new ABC();
Response.Write(obj.Method1());
Response.Write(obj.Method2());
Another way
Response.Write(new ABC().Method1());
Response.Write(new ABC().Method2());
The output will be same for above two method .
Can some please help me understanding the difference between obj.Method1() and new ABC().Method1()
Thanks in Advance..
obj and new ABC() are separate instances. In your example the output is the same because there is no instance-level data to show.
Try this to see the difference:
Class ABC
{
public string Name = "default";
public string Method1()
{
return "a";
}
}
then use the code below to show the difference with instance-level data:
ABC obj=new ABC();
obj.Name = "NewObject";
Response.Write(obj.Method1());
Response.Write(obj.Name);
Response.Write(new ABC().Method1());
Response.Write(new ABC().Name);
What #d-stanley is trying to say is that you allocate memory on creation that is is very valuable resource.
And the more complete answer is this: Classes created with some logic in mind. Although is perfectly workable Response.Write(new ABC().Method1()); but this is very short function and not as much useless... When you design class you implemented some logic boundary functionality and properties. For example FileStream has a inner property of Stream and make it accessible via various properties and you could set it in overloaded Open() method and destroy it in Dispose() method. And for example another class BinaryReader implements Stream also but threat it differently. From your logic you could implement all functions on single class - some MotherOfAllFunctions class the implements all the functions of FileStream and BinaryReader - but it's not a way of doing it.
Another point: In most of the cases some (or huge) ammount of memory is taken to initialize some internal logic of the class - for example SqlConnection class. Then you call Open() or any other method to call a database - there's some very powerful mechanics is thrown kick-in to support state machine initialization, managed-to-unmanagment calls and a lot of code could be executed.
Actually what you doing in any new SomeCLass().SomeMethod<int>(ref AnotherObject) is:
Response.Write(
var tmpABC = new ABC(); // Constructor call . Executed always (may throw)
string result = tmpABC.Method1(); // Or whatever could be casted to `string`
tmpABC.Dispose(); // GC will kick-in and try to free memory
return result;
);
As you see - this is the same code as if you have written it in this way. So what happens here is a lot of memory allocations and almost immediately all this valuable memory is thrown away. It makes more sense to initialize ABC() class and all it functionality power once and then use it everywhere so minimize memory over allocation. For example - it doesn't make any sense to open SqlConnection function in every function call in your DAL class the then immediately close it - better declare local variable and keep it alive - some fully initialized classes live as long as application thread process exist. So in case of this code style:
public class Program
{
private static FileStream streamToLogFile = new FileStream(...);
public int Main(string [] args)
{
new Run(new Form1(streamToLogFile));
}
}
In this logic - there's no need to keep class Form1 and I created it inline but all the functions the need to access FileStream object (valuable resource !) will access the same instance that been initialized only once.
Summary: C#/.NET is supposed to be garbage collected. C# has a destructor, used to clean resources. What happen when an object A is garbage collected the same line I try to clone one of its variable members? Apparently, on multiprocessors, sometimes, the garbage collector wins...
The problem
Today, on a training session on C#, the teacher showed us some code which contained a bug only when run on multiprocessors.
I'll summarize to say that sometimes, the compiler or the JIT screws up by calling the finalizer of a C# class object before returning from its called method.
The full code, given in Visual C++ 2005 documentation, will be posted as an "answer" to avoid making a very very large questions, but the essential are below:
The following class has a "Hash" property which will return a cloned copy of an internal array. At is construction, the first item of the array has a value of 2. In the destructor, its value is set to zero.
The point is: If you try to get the "Hash" property of "Example", you'll get a clean copy of the array, whose first item is still 2, as the object is being used (and as such, not being garbage collected/finalized):
public class Example
{
private int nValue;
public int N { get { return nValue; } }
// The Hash property is slower because it clones an array. When
// KeepAlive is not used, the finalizer sometimes runs before
// the Hash property value is read.
private byte[] hashValue;
public byte[] Hash { get { return (byte[])hashValue.Clone(); } }
public Example()
{
nValue = 2;
hashValue = new byte[20];
hashValue[0] = 2;
}
~Example()
{
nValue = 0;
if (hashValue != null)
{
Array.Clear(hashValue, 0, hashValue.Length);
}
}
}
But nothing is so simple...
The code using this class is wokring inside a thread, and of course, for the test, the app is heavily multithreaded:
public static void Main(string[] args)
{
Thread t = new Thread(new ThreadStart(ThreadProc));
t.Start();
t.Join();
}
private static void ThreadProc()
{
// running is a boolean which is always true until
// the user press ENTER
while (running) DoWork();
}
The DoWork static method is the code where the problem happens:
private static void DoWork()
{
Example ex = new Example();
byte[] res = ex.Hash; // [1]
// If the finalizer runs before the call to the Hash
// property completes, the hashValue array might be
// cleared before the property value is read. The
// following test detects that.
if (res[0] != 2)
{
// Oops... The finalizer of ex was launched before
// the Hash method/property completed
}
}
Once every 1,000,000 excutions of DoWork, apparently, the Garbage Collector does its magic, and tries to reclaim "ex", as it is not anymore referenced in the remaning code of the function, and this time, it is faster than the "Hash" get method. So what we have in the end is a clone of a zero-ed byte array, instead of having the right one (with the 1st item at 2).
My guess is that there is inlining of the code, which essentially replaces the line marked [1] in the DoWork function by something like:
// Supposed inlined processing
byte[] res2 = ex.Hash2;
// note that after this line, "ex" could be garbage collected,
// but not res2
byte[] res = (byte[])res2.Clone();
If we supposed Hash2 is a simple accessor coded like:
// Hash2 code:
public byte[] Hash2 { get { return (byte[])hashValue; } }
So, the question is: Is this supposed to work that way in C#/.NET, or could this be considered as a bug of either the compiler of the JIT?
edit
See Chris Brumme's and Chris Lyons' blogs for an explanation.
http://blogs.msdn.com/cbrumme/archive/2003/04/19/51365.aspx
http://blogs.msdn.com/clyon/archive/2004/09/21/232445.aspx
Everyone's answer was interesting, but I couldn't choose one better than the other. So I gave you all a +1...
Sorry
:-)
Edit 2
I was unable to reproduce the problem on Linux/Ubuntu/Mono, despite using the same code on the same conditions (multiple same executable running simultaneously, release mode, etc.)
It's simply a bug in your code: finalizers should not be accessing managed objects.
The only reason to implement a finalizer is to release unmanaged resources. And in this case, you should carefully implement the standard IDisposable pattern.
With this pattern, you implement a protected method "protected Dispose(bool disposing)". When this method is called from the finalizer, it cleans up unmanaged resources, but does not attempt to clean up managed resources.
In your example, you don't have any unmanaged resources, so should not be implementing a finalizer.
What you're seeing is perfectly natural.
You don't keep a reference to the object that owns the byte array, so that object (not the byte array) is actually free for the garbage collector to collect.
The garbage collector really can be that aggressive.
So if you call a method on your object, which returns a reference to an internal data structure, and the finalizer for your object mess up that data structure, you need to keep a live reference to the object as well.
The garbage collector sees that the ex variable isn't used in that method any more, so it can, and as you notice, will garbage collect it under the right circumstances (ie. timing and need).
The correct way to do this is to call GC.KeepAlive on ex, so add this line of code to the bottom of your method, and all should be well:
GC.KeepAlive(ex);
I learned about this aggressive behavior by reading the book Applied .NET Framework Programming by Jeffrey Richter.
this looks like a race condition between your work thread and the GC thread(s); to avoid it, i think there are two options:
(1) change your if statement to use ex.Hash[0] instead of res, so that ex cannot be GC'd prematurely, or
(2) lock ex for the duration of the call to Hash
that's a pretty spiffy example - was the teacher's point that there may be a bug in the JIT compiler that only manifests on multicore systems, or that this kind of coding can have subtle race conditions with garbage collection?
I think what you are seeing is reasonable behavior due to the fact that things are running on multiple threads. This is the reason for the GC.KeepAlive() method, which should be used in this case to tell the GC that the object is still being used and that it isn't a candidate for cleanup.
Looking at the DoWork function in your "full code" response, the problem is that immediately after this line of code:
byte[] res = ex.Hash;
the function no longer makes any references to the ex object, so it becomes eligible for garbage collection at that point. Adding the call to GC.KeepAlive would prevent this from happening.
Yes, this is an issue that has come up before.
Its even more fun in that you need to run release for this to happen and you end up stratching your head going 'huh, how can that be null?'.
Interesting comment from Chris Brumme's blog
http://blogs.msdn.com/cbrumme/archive/2003/04/19/51365.aspx
class C {<br>
IntPtr _handle;
Static void OperateOnHandle(IntPtr h) { ... }
void m() {
OperateOnHandle(_handle);
...
}
...
}
class Other {
void work() {
if (something) {
C aC = new C();
aC.m();
... // most guess here
} else {
...
}
}
}
So we can’t say how long ‘aC’ might live in the above code. The JIT might report the reference until Other.work() completes. It might inline Other.work() into some other method, and report aC even longer. Even if you add “aC = null;” after your usage of it, the JIT is free to consider this assignment to be dead code and eliminate it. Regardless of when the JIT stops reporting the reference, the GC might not get around to collecting it for some time.
It’s more interesting to worry about the earliest point that aC could be collected. If you are like most people, you’ll guess that the soonest aC becomes eligible for collection is at the closing brace of Other.work()’s “if” clause, where I’ve added the comment. In fact, braces don’t exist in the IL. They are a syntactic contract between you and your language compiler. Other.work() is free to stop reporting aC as soon as it has initiated the call to aC.m().
That's perfectly nornal for the finalizer to be called in your do work method as after the
ex.Hash call, the CLR knows that the ex instance won't be needed anymore...
Now, if you want to keep the instance alive do this:
private static void DoWork()
{
Example ex = new Example();
byte[] res = ex.Hash; // [1]
// If the finalizer runs before the call to the Hash
// property completes, the hashValue array might be
// cleared before the property value is read. The
// following test detects that.
if (res[0] != 2) // NOTE
{
// Oops... The finalizer of ex was launched before
// the Hash method/property completed
}
GC.KeepAlive(ex); // keep our instance alive in case we need it.. uh.. we don't
}
GC.KeepAlive does... nothing :) it's an empty not inlinable /jittable method whose only purpose is to trick the GC into thinking the object will be used after this.
WARNING: Your example is perfectly valid if the DoWork method were a managed C++ method... You DO have to manually keep the managed instances alive manually if you don't want the destructor to be called from within another thread. IE. you pass a reference to a managed object who is going to delete a blob of unmanaged memory when finalized, and the method is using this same blob. If you don't hold the instance alive, you're going to have a race condition between the GC and your method's thread.
And this will end up in tears. And managed heap corruption...
The Full Code
You'll find below the full code, copy/pasted from a Visual C++ 2008 .cs file. As I'm now on Linux, and without any Mono compiler or knowledge about its use, there's no way I can do tests now. Still, a couple of hours ago, I saw this code work and its bug:
using System;
using System.Threading;
public class Example
{
private int nValue;
public int N { get { return nValue; } }
// The Hash property is slower because it clones an array. When
// KeepAlive is not used, the finalizer sometimes runs before
// the Hash property value is read.
private byte[] hashValue;
public byte[] Hash { get { return (byte[])hashValue.Clone(); } }
public byte[] Hash2 { get { return (byte[])hashValue; } }
public int returnNothing() { return 25; }
public Example()
{
nValue = 2;
hashValue = new byte[20];
hashValue[0] = 2;
}
~Example()
{
nValue = 0;
if (hashValue != null)
{
Array.Clear(hashValue, 0, hashValue.Length);
}
}
}
public class Test
{
private static int totalCount = 0;
private static int finalizerFirstCount = 0;
// This variable controls the thread that runs the demo.
private static bool running = true;
// In order to demonstrate the finalizer running first, the
// DoWork method must create an Example object and invoke its
// Hash property. If there are no other calls to members of
// the Example object in DoWork, garbage collection reclaims
// the Example object aggressively. Sometimes this means that
// the finalizer runs before the call to the Hash property
// completes.
private static void DoWork()
{
totalCount++;
// Create an Example object and save the value of the
// Hash property. There are no more calls to members of
// the object in the DoWork method, so it is available
// for aggressive garbage collection.
Example ex = new Example();
// Normal processing
byte[] res = ex.Hash;
// Supposed inlined processing
//byte[] res2 = ex.Hash2;
//byte[] res = (byte[])res2.Clone();
// successful try to keep reference alive
//ex.returnNothing();
// Failed try to keep reference alive
//ex = null;
// If the finalizer runs before the call to the Hash
// property completes, the hashValue array might be
// cleared before the property value is read. The
// following test detects that.
if (res[0] != 2)
{
finalizerFirstCount++;
Console.WriteLine("The finalizer ran first at {0} iterations.", totalCount);
}
//GC.KeepAlive(ex);
}
public static void Main(string[] args)
{
Console.WriteLine("Test:");
// Create a thread to run the test.
Thread t = new Thread(new ThreadStart(ThreadProc));
t.Start();
// The thread runs until Enter is pressed.
Console.WriteLine("Press Enter to stop the program.");
Console.ReadLine();
running = false;
// Wait for the thread to end.
t.Join();
Console.WriteLine("{0} iterations total; the finalizer ran first {1} times.", totalCount, finalizerFirstCount);
}
private static void ThreadProc()
{
while (running) DoWork();
}
}
For those interested, I can send the zipped project through email.