In the following code, why we don't get NullReference exception and var2 value is 56 though the TestMethod has certainly finished before 'Messagebox' line?
I read this great answer from Eric Lippert and this blog post, but I still don't get it.
void TestMethod()
{
int var1 = 10;
List<long> list1 = new List<long>();
for (int i = 0; i < 5; i++)
list1.Add(i);
ThreadPool.QueueUserWorkItem(delegate
{
int var2 = var1;
Thread.Sleep(1000);
list1.Clear();
MessageBox.Show(var2.ToString());
});
var1 = 56;
}
I think it's because delegate has formed closure around variable var1. Probably looking at how closure works internally would help you. You can refer to explanation here
The compiler (as opposed to the runtime) creates another class/type.
The function with your closure and any variables you closed
over/hoisted/captured are re-written throughout your code as members
of that class. A closure in .Net is implemented as one instance of
this hidden class.
Having that that, I believe roughly compiler generated code would look like :
void TestMethod()
{
UnspeackableClosureClass closure = new UnspeackableClosureClass(10);
List<long> list1 = new List<long>();
for (int i = 0; i < 5; i++)
list1.Add(i);
ThreadPool.QueueUserWorkItem(closure.AutoGeneratedMethod);
closure.closureVar = 56;
}
public class UnspeackableClosureClass
{
public int closureVar;
public UnspeackableClosureClass(int val){closureVar=val}
public void AutoGeneratedMethod(){
int var2 = closureVar;
Thread.Sleep(1000);
list1.Clear();
MessageBox.Show(var2.ToString());
}
}
I think what you're saying is that you expect var1 to be deallocated when TestMethod() exits. After all, the local variables are stored on the stack, and when the method exits, the stack pointer has to revert to the spot where it was before the call, meaning that all the local variables are deallocated. If that were really what were happening, var1 might not be set to null at all; it could contain garbage, or bits of some other local variable, created later when the stack pointer moves again. Is that what you mean?
What turned the light on for me is the understanding that asynchronous thinking is not stack-based at all. A stack just doesn't work-- because the order of calls do not form a stack. Instead, bits of code are associated with contextual objects which are held on the heap. They can execute in any order and even simultaneously.
Your delegate needs var1, so the compiler promotes it from a variable held in the stack to a variable held in one of these objects, associated with the delegate's behavior. This is what is called a "closure" or a "closed variable." To the delegate, it looks just like a local variable, because it is-- just not on the stack any more. It will live as long as that object needs to live, even after TestMethod() has exited.
Related
I was wondering why use a ThreadStaticAttributed field over a local variable. I don't see any difference between a local variable and a ThreadStaticField. Here is some code to furthermore underline my point:
static void Main()
{
Thread t1 = new Thread(doSomething);
t1.Start();
}
[ThreadStaticAttribute]
static int secondNumber;
static void doSomething()
{
int number = 3;
secondNumber = 7;
Console.WriteLine(number); //Compiles to 3
Console.WriteLine(secondNumber); //Compiles to 7
}
Now the following code will have the same result as the above:
static void Main()
{
Thread t1 = new Thread(doSomething);
t1.Start();
}
static void doSomething()
{
int number = 3;
int secondNumber = 7;
Console.WriteLine(number); //Compiles to 3
Console.WriteLine(secondNumber); //Compiles to 7
}
So what is the use of a [ThreadStaticAttribute] Field, if I can just as good use a local Variable in the Method?
Scope and lifetime.
Scope: public thread-static fields are accessible from anywhere (other methods, other classes). Locals are only accessible in their immediate scope (this means a function cannot allocate-and-return data on the stack).
Lifetime: Locals only last as long as their stackframe lives. Thread-static values can last for as long as the life of the process.
There is a workaround: if you need thread-local storage shared by different methods in a long call-chain you can use an object (containing many properties) passed by-ref into the different methods - though you'll run into safety problems if you spawn new threads accessing the same object), e.g.:
class ThreadLocalValues {
public Int32 SomeValue1;
public String SomeValue2;
}
void Foo(ThreadLocalValues context) {
context.SomeValue1 = 1;
SomeOtherMethod( context, otherStuff, goesHere );
}
Instead of:
[ThreadStatic]
public static Int32 SomeValue1
[ThreadStatic]
public static String SomeValue2
Why use a ThreadStaticAttribute field over a local Variable in the invoked
Method?
Because a local variable is not accessible outside of the method.
A lot of variables you access are thread static if you do web development. HttpContext.CURRENT - is different for every thread, for example.
And that is exactly the use case - sometimes you need to make some data available to third parties, but this - particularly in a server environment - often happens on a per thread scenario. You COULD put it in as parameter into a method call, but this would require a lot of parameters passed possibly through a lot of methods, so for certain things a ThreadStatic variable is better.
The first example shows a variable, which is a single instance static to all threads. Other threads can access the variable's value.
The second example gives each thread its own local variable. Other threads wouldn't be able to see the local variable.
The sample below raises an IndexOutOfRangeException at the throw statement because the variable i is beyond its limit (e.g, is 2 when the loop covers 0 and 1). I was expecting this code to create lambda blocks 0 and 1 which would each store a result in the corresponding array element. I notice, from setting breakpoints, that the async tasks don't actually start to execute till I call Task.WaitAll(). From the C# Programming Guide, I understand that the compiler went out of its way to keep i in scope after the loop exited.
So, my questions are these:
Can someone suggest a way to achieve the effect I am trying to create, that each async task should store its results in a distinct slot in the array? Task.Run() doesn't have an overload to provide arguments (which I'd use to pass i in the loop), and the lambda block declaration resists my attempt to declare parameters anyway.
Can someone provide a justification why it is desirable behavior for a lambda expression to be able to continue to refer to a local variable after it goes out of scope in its declaring block? The C# language reference bends the very meaning of "local" declaration to cover lifting by anonymous functions and lambda blocks, but this only opens the door to picking up unexpected values, as my example shows.
Here's the sample:
using System;
using System.Threading.Tasks;
namespace AsyncLifting
{
class Program
{
static void Main(string[] args)
{
const int numTasks = 2;
double[] taskResult = new double[numTasks];
Task<int>[] taskHandles = new Task<int>[numTasks];
for (int i = 0; i < numTasks; i++)
{
taskHandles[i] = Task.Run(async () =>
{
DateTime startTime = DateTime.UtcNow;
await Task.Delay(10);
try
{
taskResult[i] = (DateTime.UtcNow - startTime).TotalMilliseconds;
}
catch (Exception e) {
throw e; // IndexOutOfRange, i is 2
}
return i;
});
}
Task.WaitAll(taskHandles);
Console.WriteLine("Task waits:");
foreach (double tr in taskResult)
{
Console.WriteLine(" {0}ms.", tr);
}
}
}
}
The delegate is closing over the variable—it's not just capturing the value at that time, but the whole variable. This can occasionally be useful.
Anyway, to prevent the unintended behavior, just make a new variable and use that inside the block:
for(int i = 0; i < n; i++) {
int index = i;
DoSomething(delegate() {
myArray[index] = /* something */;
});
}
icktoofay has provided an excellent answer to your first question.
As for why this is useful behaviour, well, if local variables were deleted as soon as they went out of scope, you wouldn't be able to reference anything outside of the lambda's local variables, because the lambda might last a lot longer than it's context would if the lambda weren't deliberately keeping it around.
More concretely consider this function from perhaps a game library (I recently did something like this)
public static Func<double,Point> MakeSimpleVelocityTrajectory(double xv, double yv, double x0, double y0)
{
return (t) => {
return new Point(x0+xv*t,y0+yv*t);
}
}
If the lambda didn't keep the local variables around there would be no point to returning the lambda since it wouldn't be able to do anything because the variables it references are no longer around.
I'm working with the ref and don't understand clearly "Is it like a pointer as in C/C++ or it's like a reference in C++?"
Why did I ask such a weak question as you thought for a moment?
Because, when I'm reading C#/.NET books, msdn or talking to C# developers I'm becoming confused by the following reasons:
C# developers suggest NOT to use ref in the arguments of a function, e.g. ...(ref Type someObject) doesn't smell good for them and they suggest ...(Type someObject), I really don't understand clearly this suggestion. The reasons I heard: better to work with the copy of object, then use it as a return value, not to corrupt memory by a reference etc... Often I hear such explanation about DB connection objects. As on my plain C/C++ experience, I really don't understand why to use a reference is a bad stuff in C#? I control the life of object and its memory allocations/re-allocations etc... I read in books and forums only advises it's bad, because you can corrupt your connection and cause a memory leak by a reference lose, so I control the life of object, I may control manually what I really want, so why is it bad?
Nowadays reading different books and talk to different people, I don't clearly understand is ref a pointer (*) or a reference like in C++ by & ? As I remember pointers in C/C++ always do allocate a space with a size of void* type - 4 bytes (the valid size depends on architecture), where hosts an address to a structure or variable. In C++ by passing a reference & there is no new allocations from the heap/stack and you work with already defined objects in memory space and there is no sub-allocating memory for a pointer externally like in plain C. So what's the ref in C#? Does .NET VM handle it like a pointer in plain C/C++ and its GC allocates temporary space for a pointer or it does a work like reference in C++? Does ref work only with a managed types correctly or for value types like bool, int it's better to switch an unsafe code and pass through a pointer in unmanaged style?
In C#, when you see something referring to a reference type (that is, a type declared with class instead of struct), then you're essentially always dealing with the object through a pointer. In C++, everything is a value type by default, whereas in C# everything is a reference type by default.
When you say "ref" in the C# parameter list, what you're really saying is more like a "pointer to a pointer." You're saying that, in the method, that you want to replace not the contents of the object, but the reference to the object itself, in the code calling your method.
Unless that is your intent, then you should just pass the reference type directly; in C#, passing reference types around is cheap (akin to passing a reference in C++).
Learn/understand the difference between value types and reference types in C#. They're a major concept in that language and things are going to be really confusing if you try to think using the C++ object model in C# land.
The following are essentially semantically equivalent programs:
#include <iostream>
class AClass
{
int anInteger;
public:
AClass(int integer)
: anInteger(integer)
{ }
int GetInteger() const
{
return anInteger;
}
void SetInteger(int toSet)
{
anInteger = toSet;
}
};
struct StaticFunctions
{
// C# doesn't have free functions, so I'll do similar in C++
// Note that in real code you'd use a free function for this.
static void FunctionTakingAReference(AClass *item)
{
item->SetInteger(4);
}
static void FunctionTakingAReferenceToAReference(AClass **item)
{
*item = new AClass(1729);
}
};
int main()
{
AClass* instanceOne = new AClass(6);
StaticFunctions::FunctionTakingAReference(instanceOne);
std::cout << instanceOne->GetInteger() << "\n";
AClass* instanceTwo;
StaticFunctions::FunctionTakingAReferenceToAReference(&instanceTwo);
// Note that operator& behaves similar to the C# keyword "ref" at the call site.
std::cout << instanceTwo->GetInteger() << "\n";
// (Of course in real C++ you're using std::shared_ptr and std::unique_ptr instead,
// right? :) )
delete instanceOne;
delete instanceTwo;
}
And for C#:
using System;
internal class AClass
{
public AClass(int integer)
: Integer(integer)
{ }
int Integer { get; set; }
}
internal static class StaticFunctions
{
public static void FunctionTakingAReference(AClass item)
{
item.Integer = 4;
}
public static void FunctionTakingAReferenceToAReference(ref AClass item)
{
item = new AClass(1729);
}
}
public static class Program
{
public static void main()
{
AClass instanceOne = new AClass(6);
StaticFunctions.FunctionTakingAReference(instanceOne);
Console.WriteLine(instanceOne.Integer);
AClass instanceTwo = new AClass(1234); // C# forces me to assign this before
// it can be passed. Use "out" instead of
// "ref" and that requirement goes away.
StaticFunctions.FunctionTakingAReferenceToAReference(ref instanceTwo);
Console.WriteLine(instanceTwo.Integer);
}
}
A ref in C# is equivalent to a C++ reference:
Their intent is pass-by-reference
There are no null references
There are no uninitialized references
You cannot rebind references
When you spell the reference, you are actually denoting the referred variable
Some C++ code:
void foo(int& x)
{
x = 42;
}
// ...
int answer = 0;
foo(answer);
Equivalent C# code:
void foo(ref int x)
{
x = 42;
}
// ...
int answer = 0;
foo(ref answer);
Every reference in C# is pointer to objects on heap as pointer in C++ and ref of C# is same as & in C++
The reason ref should be avoided is, C# works on fundamental that method should not change the object passed in parameter, because for someone who does not have source of method may not know if it will result in loss of data or not.
String a = " A ";
String b = a.Trim();
In this case I am confident that a remains intact. In mathematics change should be seen as an assignment that visually tells is that b is changed here by programmer's consent.
a = a.Trim();
This code will modify a itself and the coder is aware of it.
To preserve this method of change by assignment ref should be avoided unless it is exceptional case.
C# has no equvalent of C++ pointers and works on references. ref adds a level of indirection. It makes value type argument a reference and when used with reference type it makes it a reference to a reference.
In short it allows to carry any changes to a value type outside a method call. For reference type it allows to replace the original reference to a totally different object (and not just change object content). It can be used if you want to re-initialize an object inside a method and the only way to do it is to recreate it. Although I would try avoid such an approach.
So to answer your question ref would be like C++ reference to a reference.
EDIT
The above is true for safe code. Pointers do exist in unsafe C# and are used in some very specific cases.
This seems like a disposing/eventing nightmare. If I have an object who's events are registered for and pass it into a function by reference and that reference is then reallocated, the dispose should be called or the memory will be allocated until the program is closed. If the dispose is called everything registered to the objects events will no longer be registered for and everything it is registered for will no longer be registered for. How would someone keep this straight? I guess you could compare memory addresses and try to bring things back to sanity if you don't go insane.
in c# you can check run unsafe in your project properties
and then you can run this code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Exercise_01
{
public struct Coords
{
public int X;
public int Y;
public override string ToString() => $"({X}, {Y})";
}
class Program
{
static unsafe void Main(string[] args)
{
int n = 0;
SumCallByRefPointer(1, 2, &n);
Console.Clear();
Console.WriteLine("call by refrence {0}",n);
n = 0;
SumCallByValue(3, 4, n);
Console.WriteLine("call by Value {0}", n);
n = 0;
SumCallByRef(5, 6, ref n);
Console.WriteLine("call by refrence {0}", n);
Pointer();
Console.ReadLine();
}
private static unsafe void SumCallByRefPointer(int a, int b, int* c)
{
*c = a + b;
}
private static unsafe void SumCallByValue(int a, int b, int c)
{
c = a + b;
}
private static unsafe void SumCallByRef(int a, int b, ref int c)
{
c = a + b;
}
public static void Pointer()
{
unsafe
{
Coords coords;
Coords* p = &coords;
p->X = 3;
p->Y = 4;
Console.WriteLine(p->ToString()); // output: (3, 4)
}
}
}
}
I have this code, which works as I wanted but I don't understand exactly why. Thinking about a stack in C, C++, I'd guess that the p variable will be on the stack on each call, and then erased when the method returns. How does the closure of the thread captures it and more over, captures the correct value every time? The output is what I wanted - files are "_a", "_b", "_c".
public enum enumTest
{
a = 1,
b =2,
c=3
}
private void Form1_Load(object sender, EventArgs e)
{
callme(enumTest.a);
callme(enumTest.b);
callme(enumTest.c);
}
private void callme(enumTest p)
{
Thread t = new Thread(() =>
{
Thread.Sleep(2000);
Guid guid = Guid.NewGuid();
File.WriteAllText(guid.ToString() + "_" + p.ToString(), "");
});
t.Start();
}
Lambdas are just glorified anonymous delegates
Rick Strahl (http://www.west-wind.com/weblog/posts/2008/Apr/26/Variable-Scoping-in-Anonymous-Delegates-in-C)
Rick's article describes how the compiler generates a class that handles the enumTest p value and delegate.
Also good info at Where does anonymous function body variables saved ?
Basically the compiler creates a new instance of the "closure class" with local variables that must be passed to lambda. This is why you output is correct.
UPDATE
In the case of:
for (int i=0; i<10; i++)
{
var t = new Thread(() => { Console.WriteLine(i); });
t.Start();
}
The variable i is shared between the for and the lambda. Each thread is accessing the same i. And since the for loop tends to finsih before any thread runs, all you see is '10'.
See http://msdn.microsoft.com/en-us/library/0yw3tz5k(v=vs.80).aspx
It's not about closures, here is no any value capturing.
What happening here is that your p parameter is copied by value into the thread's function. Everytime you pass to a function a new value of p is copied to a function.
How does the closure of the thread captures it and more over, captures the correct value every time?
That is compiler magic. Simply because the p parameter is being used by the lambda the compiler treats it differently. p is not placed on the stack but on the heap. That is why it still exists after callme() has terminated.
what is the memory overhead on the stack and heap of A versus B
A:
private string TestA()
{
string a = _builder.Build();
return a;
}
B:
private string TestB()
{
return _builder.Build();
}
re the efficiency question; the two are identical, and in release mode will be reduced to the same thing. Either way, string is a reference-type, so the string itself is always on the heap. The only thing on the stack would be the reference to the string - a few bytes (no matter the string length).
"do all local variables go on the stack": no; there are two exceptions:
captured variables (anonymous methods / lambdas)
iterator blocks (yield return etc)
In both cases, there is a compiler generated class behind the scenes:
int i = 1;
Action action = delegate {i++;};
action();
Console.WriteLine(i);
is similar to:
class Foo {
public int i; // yes, a public field
public void SomeMethod() {i++;}
}
...
Foo foo = new Foo();
foo.i = 1;
Action action = foo.SomeMethod;
action();
Console.WriteLine(foo.i);
Hence i is on an object, hence on the heap.
Iterator blocks work in a similar way, but with the state machine.
They both get optimised to the same thing.
In answer to the question in your title "do all local variables go on the stack" the simple answer is not exactly. All objects get stored on the 'heap' (don't remember if that's what it's called in .NET) regardless. C# has a generational-based garbage collector that's aware that some objects only live a very short time and so is designed to manage this efficiently.