Why are static fields null by default? - c#

Just curious, from the code below I can see that static field of type A will be null by default while variable of that type needs to be initialized to ahve at least null value. Could anyone explain the difference a bit more? Thanks
class Program
{
static A _a; //it is null by default
static void Main(string[] args)
{
A nonStaticA; //empty reference, exception when used
A correctA=null;
}
}
class A
{
}

The initial value of a field, whether it be a static field or an instance field, is the default value of the field's type. It is not possible to observe the value of a field before this default initialization has occurred, and a field is thus never "uninitialized".
If a static constructor exists in the class, execution of the static field initializers occurs immediately prior to executing that static constructor. Otherwise, the static field initializers are executed at an implementation-dependent time prior to the first use of a static field of that class.
A local variable is not automatically initialized and thus has no default value. For the purpose of definite assignment checking, a local variable is considered initially unassigned.

The difference is between local variables and fields, not between static and instance.
Local variables of any type are required to be initialized to a value before they are used the first time. This prevents bugs where you forget to initialize a variable.
The compiler can verify this, because local variables only exist inside a single method, and code in a method is executed in a predictable order, from the top town. So it is easy to check if a variable is accessed before it is initialized.
Fields are different. They can be accessed in multiple methods, and there is no way the compiler can determine in which order they are executed. Therefore it cannot check at compile-time that fields are initialized before they are accessed. Instead fields are given a default value, which is null for all reference types, 0 for integers and so on.

It has nothing to do with static. Class fields (instance and static) are initialized to their defaults, local variables are not.
And why? Like lots of things it was a design decision at some point.

C#, as far as I can observes, kept many things as they were in other previous languages, mainly C++.
The reason in C++ (which may or may not be relevant to C#) is that static (or global) objects are statically written into the executable or library while for other objects the code that creates the object (and not the object itself) is written into the executable or library. For objects on the stack usually code that subtracts some value from the stack pointer is written.
When the executable or library is loaded into memory by the OS the static fields are just a bunch of bytes that are copied as is into memory (the data segment of the process). Since they are copied as is, they already have a value (the value in the executable or library file). Because of that there is no performance implications to setting it to a specific value. For that reason (as far as I can see) the C++ standard made their value deterministic (if they are not initialized explicitly) and what is more natural than zero as an initialization value?!
In order to initialize a dynamic object (whether it is on the stack or the heap) code has to be inserted into the executable or library. This has performance implications (and maybe other implications) so the C++ standard preferred to leave it up to the programmer.
I'm not completely sure every bit of this data is true, but it is what seems logical to me from what I know.

Related

Why is the int default value not taken during runtime? [duplicate]

Why must I initialize variables inside methods?
int test1; // Not initialized, but OK
public int Foo()
{
int test2; // Not initialized
int test3 = test1; // OK
int test4 = test2; // An error
}
Fields are automatically initialized to the logical zero for the type; this is implicit. Variables must obey "definite assignment", so must be assigned before they can be read.
ECMA 334v4
§17.4.4 Field initialization
The initial value of a field, whether
it be a static field or an instance
field, is the default value (§12.2) of
the field’s type. It is not possible
to observe the value of a field before
this default initialization has
occurred, and a field is thus never
"uninitialized".
and
§12. Variables
...
A variable shall be definitely assigned (§12.3) before its
value can be obtained.
...
Extending Mark's answer, local variable initialization is also related to the verification process.
The CLI requires that in any verifiable code (that is, modules that didn't explicitly asked to skip the verification process using the SkipVerfication property from the SecurityPermission attribute), all local variables must be initialized before their usage. Failing to do so will result in a VerficationException being thrown.
More interestingly, is that the compiler automatically adds the .locals init flag on every method that uses local variables. This flag causes the JIT compiler to generate code that initializes all of the local variables to their default values. Meaning, even though you have already initialized them in your own code, the JIT will comply with the .locals init flag and generate the proper initialization code. This "duplicate initialization" doesn't effect performance since in configurations that allow optimizations, the JIT compiler will detect the duplication and effectively treat it as "dead code" (the auto-generated initialization routine will not appear in the generated assembler instructions).
According to Microsoft (also, backed up by Eric Lippert in response to a question on his blog), on most occasions, when programmers don't initialize their local variable, they don't do so because they rely on the underlying environment to initialize their variable to their default values, but only because they "forgot", thus, causing sometimes-illusive logical bugs.
So in order to reduce the probability for bugs of this nature to appear in C# code, the compiler still insists you will initialize your local variables. Even though it's going to add the .locals init flag to the generated CIL code.
A more comprehensive explanation on this subject can be found here: Behind The .locals init Flag
It actually shouldn't. Your error should be on the second line, not the first, and should be because you used it before you initialized it.
The compiler is helping you here.
So don't initialize them as a habit. Instead let the compiler help you out!
The nice thing about this is that it will path check for you. If you have a switch statement with three cases where each sets the value, but you forget to set it in your "default", but use it afterwards, it will warn you that you missed a path.
If you initialize variables to = 0, you take that benefit away.
As Marc indicates, that's what the specification says. The reason this is a good thing is that there are some valid reasons to leave a member uninitialized rather than a local variable, whose lifetime is bounded by the method it is in. Mostly you'd only ever want this for performance reasons, if the variable is expensive to initialize, and should only be initialized under specific usage scenarios. For my part, I'd avoid uninitialized members until my back was truly against the wall, though!
For local variables, it is also much easier to detect whether all code paths are likely to lead to initialization, whereas there are no good heuristics to determine whether all code paths across the entire programme guarantee initialization before use. A completely correct answer is impossible in both cases, as all computer science students should know.

What does .Net do when you declare an object without an instance?

I wonder to know how the .Net Framework handles the declared but not instantiated object situation.
For example i declare an object like
DropDownList ddl;
and do nothing about it. I know that i should do something with this variable and get a warning about it, but what i don't know is the where it will be stored.
Is there a lookup table that stores the data of all declared variables? Or is there a virtual reference for every declaration?
Edit : I just wanted to know how the memory allocated for this object declaration.
Edit2 : Whether it's a local variable or not, i'm just talking about the memory allocation structure. I wonder to know where this references stored.
If ddl is a field, then the value of ddl will be null, as it is a reference type.
Any attempt to call a member on it will result in a NullReferenceException.
If it is a local variable it will simply be unassigned.
Value types will get the default(T) of their type.
The compiler itself may remove the call completely, depending on where it was declared, but this is an implementation detail.
If you are talking about a local variable then the compiler can simply optimize it out of existence since noone can be using it (if you attempted to use it without initializing the compiler would have protested with an error). In fact the .NET 4 compiler did this for me when I tested just moments ago.
If you are talking about a field in a class then it is initialized with the default value for its type as part of the object construction.
From your description, it sounds like you're talking about a local variable. When you declare a local variable in usual implementations and without any optimizations, then space is reserved for it on the stack (most probably), with a null reference as its value.
You could look into the StackFrame class if you want to inspect further (I've never used it).
The variable is stored in your assembly. It will always have it's default value null.
In release mode (compiler is set to optimize) it's optimized and it is not stored anywhere.
If you want to know more about IL and how the compiler works, wikipedia has a good article to start.
All variables are stored into a class or method. Variables declared into a class can be listed using .NET Reflection :
class Class1 { private int i; public string s; }
typeof(Class1).GetFields(BindingFlags.Instance); // returns all instance fields
typeof(Class1).GetFields(); // returns all instance public fields
typeof(Class1).GetProperties(); // returns all instance public properties
Variables declared into a method cannot be inspected with .NET Reflection mechanisms.

About unassigned variables

Just curious, I'm not trying to solve any problem.
Why only local variables should be assigned?
In the following example:
class Program
{
static int a;
static int b { get; set; }
static void Main(string[] args)
{
int c;
System.Console.WriteLine(a);
System.Console.WriteLine(b);
System.Console.WriteLine(c);
}
}
Why a and b gives me just a warning and c gives me an error?
Addionally, why I can't just use the default value of Value Type and write the following code?
bool MyCondition = true;
int c;
if (MyCondition)
c = 10;
Does it have anything to do with memory management?
Tim gives a good answer to your first question but I can add a few more details.
Your first question is basically "locals are required to be definitely assigned, so why not make the same restriction on fields?" Tim points out that Jon points out that it is actually quite difficult to do so. With locals it is almost always crystal clear when a local is first read and when it is first written. In the cases where it is not clear, the compiler can make reasonable conservative guesses. But with fields, to know when a first read and a first write happens, you have to know all kinds of things about which methods are called in what order.
In short, analyzing locals requires local analysis; the analysis doesn't have to go past the block that contains the declaration. Field analysis requires global analysis. That's a lot of work; it's easier to just say that fields are initialized to their default values.
(Now, that said, it is of course possible to do this global analysis; my new job will likely involve doing precisely this sort of analysis.)
Your second question is basically "Well then, if automatic assignment of default values is good enough for fields then why isn't it good enough for locals?" and the answer is "because failing to assign a local variable and accidentally getting the default value is a common source of bugs." C# was carefully designed to discourage programming practices that lead to irritating bugs, and this is one of them.
Because other variables are initialized with their default value.
Jon Skeet has already found some interesting words on this issue:
For local variables, the compiler has a good idea of the flow - it can
see a "read" of the variable and a "write" of the variable, and prove
(in most cases) that the first write will happen before the first
read.
This isn't the case with instance variables. Consider a simple
property - how do you know if someone will set it before they get it?
That makes it basically infeasible to enforce sensible rules - so
either you'd have to ensure that all fields were set in the
constructor, or allow them to have default values. The C# team chose
the latter strategy.
and here's the related C# language specification:
5.3 Definite assignment
At a given location in the executable code of a function member, a
variable is said to be definitely assigned if the compiler can prove,
by a particular static flow analysis (§5.3.3), that the variable has
been automatically initialized or has been the target of at least one
assignment.
5.3.1 Initially assigned variables
The following categories of variables are classified as initially
assigned:
Static variables.
Instance variables of class instances.
Instance variables of initially assigned struct variables.
Array elements.
Value parameters.
Reference parameters.
Variables declared in a catch clause or a foreach statement.
5.3.2 Initially unassigned variables
The following categories of variables are classified as initially
unassigned:
Instance variables of initially unassigned struct variables.
Output parameters, including the this variable of struct instance
constructors.
Local variables, except those declared in a catch clause or a
foreach statement.
The CLR provides a hard guarantee that local variables are initialized to their default value. But this guarantee does have limitations. What is missing is its ability to recognize scope blocks inside the method body. They disappear once the compiler translates the code to IL. Scope is a language construct that doesn't have a parallel in the CLI and cannot be expressed in IL.
You can see this going wrong in a language like VB.NET for example. This contrived example shows the behavior:
Module Module1
Sub Main()
For ix = 1 To 3
Dim s As String
If ix = 2 Then s = "foo"
If s Is Nothing Then Console.WriteLine("null") Else Console.WriteLine(s)
Next
Console.ReadLine()
End Sub
End Module
Output:
null
foo
foo
Or in other words, the local variable s was initialized only once and retains its value afterwards. This has a knack for creating bugs of course. The VB.NET compiler does generate a warning for it and has simple syntax to avoid it (As New). A managed language like C++/CLI has the same behavior but doesn't emit a diagnostic at all. But the C# language gives a stronger guarantee, it generates an error.
This rule is called "definite assignment". The exact rules are explained in detail in the C# Language Specification, chapter 5.3.3
Definite assignment checking has its limitations. It works well for local variables since their scope is very limited (private to the method body) and you can't get to them with Reflection. Much harder to do with fields of a class, it requires whole-program analysis that may need to reach beyond what the compiler can see. Like code in another assembly. Which is why the C# compiler can only warn about it but can't reject the code out-right.

const, readonly and mutable value types

I'm continuing my study of C# and the language specification and Here goes another behavior that I don't quite understand:
The C# Language Specification clearly states the following in section 10.4:
The type specified in a constant declaration must be sbyte, byte, short, ushort, int, uint, long, ulong, char, float, double, decimal, bool, string, an enum-type, or a reference-type.
It also states in section 4.1.4 the following:
Through const declarations it is possible to declare constants of the simple types (§10.4). It is not possible to have constants of other struct types, but a similar effect is provided by static readonly fields.
Ok, so a similar effect can be gained by using static readonly. Reading this I went and tried the following code:
static void Main()
{
OffsetPoints();
Console.Write("Hit a key to exit...");
Console.ReadKey();
}
static Point staticPoint = new Point(0, 0);
static readonly Point staticReadOnlyPoint = new Point(0, 0);
public static void OffsetPoints()
{
PrintOutPoints();
staticPoint.Offset(1, 1);
staticReadOnlyPoint.Offset(1, 1);
Console.WriteLine("Offsetting...");
Console.WriteLine();
PrintOutPoints();
}
static void PrintOutPoints()
{
Console.WriteLine("Static Point: X={0};Y={1}", staticPoint.X, staticPoint.Y);
Console.WriteLine("Static readonly Point: X={0};Y={1}", staticReadOnlyPoint.X, staticReadOnlyPoint.Y);
Console.WriteLine();
}
The output of this code is:
Static Point: X=0;Y=0
Static readonly Point: X=0;Y=0
Offsetting...
Static Point: X=1;Y=1
Static readonly Point: X=0;Y=0
Hit a key to exit...
I really expected the compiler to give me some kind of warning about mutating a static readonly field or failing that, to mutate the field as it would with a reference type.
I know mutable value types are evil (why did Microsoft ever implement Point as mutable is a mystery) but shouldn't the compiler warn you in some way that you are trying to mutate a static readonly value type? Or at least warn you that your Offset() method will not have the "desired" side effects?
Eric Lippert explains what's going on here:
...if the field is readonly and the reference occurs outside an
instance constructor of the class in which the field is declared, then
the result is a value, namely the value of the field I in the object
referenced by E.
The important word here is that the result is the value of the field,
not the variable associated with the field. Readonly fields are not
variables outside of the constructor. (The initializer here is
considered to be inside the constructor; see my earlier post on that
subject.)
Oh and just to stress on the evilness of mutable structs, here is his conclusion:
This is yet another reason why mutable value types are evil. Try to
always make value types immutable.
The point of the readonly is that you cannot reassign the reference or value.
In other words if you attempted this
staticReadOnlyPoint = new Point(1, 1);
you would get a compiler error because you are attempting to reassign staticReadOnlyPoint. The compiler will prevent you from doing this.
However, readonly doesn't enforce whether the value or referenced object itself is mutable - that is a behaviour that is designed into the class or struct by the person creating it.
[EDIT: to properly address the odd behaviour being described]
The reason you see the behaviour where staticReadOnlyPoint appears to be immutable is not because it is immutable itself, but because it is a readonly struct. This means that every time you access it, you are taking a full copy of it.
So your line
staticReadOnlyPoint.Offset(1, 1);
is accessing, and mutating, a copy of the field, not the actual value in the field. When you subsequently write out the value you are then writing out yet another copy of the original (not the mutated copy).
The copy you did mutate with the call to Offset is discarded, because it is never assigned to anything.
The compiler simply doesn't have enough information available about a method to know that the method mutates the struct. A method may well have a side-effect that's useful but doesn't otherwise change any members of the struct. If would technically be possible to add such analysis to the compiler. But that won't work for any types that live in another assembly.
The missing ingredient is a metadata token that indicates that a method doesn't mutate any members. Like the const keyword in C++. Not available. It would have be drastically non-CLS compliant if it was added in the original design. There are very few languages that support the notion. I can only think of C++ but I don't get out much.
Fwiw, the compiler does generate explicit code to ensure that the statement cannot accidentally modify the readonly. This statement
staticReadOnlyPoint.Offset(1, 1);
gets translated to
Point temp = staticReadOnlyPoint; // makes a copy
temp.Offset(1, 1);
Adding code that then compares the value and generates a runtime error is also only technically possible. It costs too much.
The observed behavior is an unfortunate consequence of the fact that neither the Framework nor C# provides any means by which member function declarations can specify whether this should be passed by ref, const-ref, or value. Instead, value types always pass this by (non-const-restricted) ref, and reference types always pass this by value.
The 'proper' behavior for a compiler would be to forbid passing immutable or temporary values by non-const-restricted ref. If such restriction could be imposed, ensuring proper semantics for mutable value types would mean following a simple rule: if you make an implicit copy of a struct, you're doing something wrong. Unfortunately, the fact that member functions can only accept this by non-const-restricted ref means a language designer must make one of three choices:
Guess that a member function won't modify `this`, and simply pass immutable or temporary variables by `ref`. This would be most efficient for functions which do not, in fact, modify `this`, but could dangerously expose to modification things that should be immutable.
Don't allow member functions to be used on immutable or temporary entities. This would avoid improper semantics, but would be a really annoying restriction, especially given that most member functions do not modify `this`.
Allow the use of member functions except those deemed most likely to modify `this` (e.g. property setters), but instead of passing immutable entities directly by ref, copy them to temporary locations and pass those.
Microsoft's choice protects constants from improper modification, but has the unfortunate consequences that code will run needlessly slowly when calling functions that don't modify this, while generally working incorrectly for those which do.
Given the way this is actually handled, one's best bet is to avoid making any changes to it in structure member functions other than property setters. Having property setters or mutable fields is fine, since the compiler will correctly forbid any attempt to use property setters on immutable or temporary objects, or to modify any fields thereof.
If you look at the IL, you will see that on usage of the readonly field, a copy is made before calling Offset:
IL_0014: ldsfld valuetype [System.Drawing]System.Drawing.Point
Program::staticReadOnlyPoint
IL_0019: stloc.0
IL_001a: ldloca.s CS$0$0000
Why this is happening, is beyond me.
It could be part of the spec, or a compiler bug (but it looks a bit too intentional for the latter).
The effect is due to several well-defined features coming together.
readonly means that the field in question cannot be changed, but not that the target of the field cannot be changed. This is more easily understood (and more often useful in practice) with readonly fields of a mutable reference type, where you can do x.SomeMutatingMethod() but not x = someNewObject.
So, first item is; you can mutate the target of a readonly field.
Second item is, that when you access a non-variable value type you obtain a copy of the value. The least confusing example of this is giveMeAPoint().Offset(1, 1) because there isn't a known location for us to later observe that the value-type returned by giveMeAPoint() may or may not have been mutated.
This is why value types are not evil, but are in some ways worse. Truly evil code doesn't have a well-defined behaviour, and all of this is well-defined. It's still confusing though (confusing enough for me to get this wrong on my first answer), and confusing is worse than evil when you're trying to code. Easily understood evil is so much more easily avoided.

Why must local variables have initial values?

Why must I initialize variables inside methods?
int test1; // Not initialized, but OK
public int Foo()
{
int test2; // Not initialized
int test3 = test1; // OK
int test4 = test2; // An error
}
Fields are automatically initialized to the logical zero for the type; this is implicit. Variables must obey "definite assignment", so must be assigned before they can be read.
ECMA 334v4
§17.4.4 Field initialization
The initial value of a field, whether
it be a static field or an instance
field, is the default value (§12.2) of
the field’s type. It is not possible
to observe the value of a field before
this default initialization has
occurred, and a field is thus never
"uninitialized".
and
§12. Variables
...
A variable shall be definitely assigned (§12.3) before its
value can be obtained.
...
Extending Mark's answer, local variable initialization is also related to the verification process.
The CLI requires that in any verifiable code (that is, modules that didn't explicitly asked to skip the verification process using the SkipVerfication property from the SecurityPermission attribute), all local variables must be initialized before their usage. Failing to do so will result in a VerficationException being thrown.
More interestingly, is that the compiler automatically adds the .locals init flag on every method that uses local variables. This flag causes the JIT compiler to generate code that initializes all of the local variables to their default values. Meaning, even though you have already initialized them in your own code, the JIT will comply with the .locals init flag and generate the proper initialization code. This "duplicate initialization" doesn't effect performance since in configurations that allow optimizations, the JIT compiler will detect the duplication and effectively treat it as "dead code" (the auto-generated initialization routine will not appear in the generated assembler instructions).
According to Microsoft (also, backed up by Eric Lippert in response to a question on his blog), on most occasions, when programmers don't initialize their local variable, they don't do so because they rely on the underlying environment to initialize their variable to their default values, but only because they "forgot", thus, causing sometimes-illusive logical bugs.
So in order to reduce the probability for bugs of this nature to appear in C# code, the compiler still insists you will initialize your local variables. Even though it's going to add the .locals init flag to the generated CIL code.
A more comprehensive explanation on this subject can be found here: Behind The .locals init Flag
It actually shouldn't. Your error should be on the second line, not the first, and should be because you used it before you initialized it.
The compiler is helping you here.
So don't initialize them as a habit. Instead let the compiler help you out!
The nice thing about this is that it will path check for you. If you have a switch statement with three cases where each sets the value, but you forget to set it in your "default", but use it afterwards, it will warn you that you missed a path.
If you initialize variables to = 0, you take that benefit away.
As Marc indicates, that's what the specification says. The reason this is a good thing is that there are some valid reasons to leave a member uninitialized rather than a local variable, whose lifetime is bounded by the method it is in. Mostly you'd only ever want this for performance reasons, if the variable is expensive to initialize, and should only be initialized under specific usage scenarios. For my part, I'd avoid uninitialized members until my back was truly against the wall, though!
For local variables, it is also much easier to detect whether all code paths are likely to lead to initialization, whereas there are no good heuristics to determine whether all code paths across the entire programme guarantee initialization before use. A completely correct answer is impossible in both cases, as all computer science students should know.

Categories

Resources