Why must local variables have initial values? - c#

Why must I initialize variables inside methods?
int test1; // Not initialized, but OK
public int Foo()
{
int test2; // Not initialized
int test3 = test1; // OK
int test4 = test2; // An error
}

Fields are automatically initialized to the logical zero for the type; this is implicit. Variables must obey "definite assignment", so must be assigned before they can be read.
ECMA 334v4
§17.4.4 Field initialization
The initial value of a field, whether
it be a static field or an instance
field, is the default value (§12.2) of
the field’s type. It is not possible
to observe the value of a field before
this default initialization has
occurred, and a field is thus never
"uninitialized".
and
§12. Variables
...
A variable shall be definitely assigned (§12.3) before its
value can be obtained.
...

Extending Mark's answer, local variable initialization is also related to the verification process.
The CLI requires that in any verifiable code (that is, modules that didn't explicitly asked to skip the verification process using the SkipVerfication property from the SecurityPermission attribute), all local variables must be initialized before their usage. Failing to do so will result in a VerficationException being thrown.
More interestingly, is that the compiler automatically adds the .locals init flag on every method that uses local variables. This flag causes the JIT compiler to generate code that initializes all of the local variables to their default values. Meaning, even though you have already initialized them in your own code, the JIT will comply with the .locals init flag and generate the proper initialization code. This "duplicate initialization" doesn't effect performance since in configurations that allow optimizations, the JIT compiler will detect the duplication and effectively treat it as "dead code" (the auto-generated initialization routine will not appear in the generated assembler instructions).
According to Microsoft (also, backed up by Eric Lippert in response to a question on his blog), on most occasions, when programmers don't initialize their local variable, they don't do so because they rely on the underlying environment to initialize their variable to their default values, but only because they "forgot", thus, causing sometimes-illusive logical bugs.
So in order to reduce the probability for bugs of this nature to appear in C# code, the compiler still insists you will initialize your local variables. Even though it's going to add the .locals init flag to the generated CIL code.
A more comprehensive explanation on this subject can be found here: Behind The .locals init Flag

It actually shouldn't. Your error should be on the second line, not the first, and should be because you used it before you initialized it.
The compiler is helping you here.
So don't initialize them as a habit. Instead let the compiler help you out!
The nice thing about this is that it will path check for you. If you have a switch statement with three cases where each sets the value, but you forget to set it in your "default", but use it afterwards, it will warn you that you missed a path.
If you initialize variables to = 0, you take that benefit away.

As Marc indicates, that's what the specification says. The reason this is a good thing is that there are some valid reasons to leave a member uninitialized rather than a local variable, whose lifetime is bounded by the method it is in. Mostly you'd only ever want this for performance reasons, if the variable is expensive to initialize, and should only be initialized under specific usage scenarios. For my part, I'd avoid uninitialized members until my back was truly against the wall, though!
For local variables, it is also much easier to detect whether all code paths are likely to lead to initialization, whereas there are no good heuristics to determine whether all code paths across the entire programme guarantee initialization before use. A completely correct answer is impossible in both cases, as all computer science students should know.

Related

Why is the int default value not taken during runtime? [duplicate]

Why must I initialize variables inside methods?
int test1; // Not initialized, but OK
public int Foo()
{
int test2; // Not initialized
int test3 = test1; // OK
int test4 = test2; // An error
}
Fields are automatically initialized to the logical zero for the type; this is implicit. Variables must obey "definite assignment", so must be assigned before they can be read.
ECMA 334v4
§17.4.4 Field initialization
The initial value of a field, whether
it be a static field or an instance
field, is the default value (§12.2) of
the field’s type. It is not possible
to observe the value of a field before
this default initialization has
occurred, and a field is thus never
"uninitialized".
and
§12. Variables
...
A variable shall be definitely assigned (§12.3) before its
value can be obtained.
...
Extending Mark's answer, local variable initialization is also related to the verification process.
The CLI requires that in any verifiable code (that is, modules that didn't explicitly asked to skip the verification process using the SkipVerfication property from the SecurityPermission attribute), all local variables must be initialized before their usage. Failing to do so will result in a VerficationException being thrown.
More interestingly, is that the compiler automatically adds the .locals init flag on every method that uses local variables. This flag causes the JIT compiler to generate code that initializes all of the local variables to their default values. Meaning, even though you have already initialized them in your own code, the JIT will comply with the .locals init flag and generate the proper initialization code. This "duplicate initialization" doesn't effect performance since in configurations that allow optimizations, the JIT compiler will detect the duplication and effectively treat it as "dead code" (the auto-generated initialization routine will not appear in the generated assembler instructions).
According to Microsoft (also, backed up by Eric Lippert in response to a question on his blog), on most occasions, when programmers don't initialize their local variable, they don't do so because they rely on the underlying environment to initialize their variable to their default values, but only because they "forgot", thus, causing sometimes-illusive logical bugs.
So in order to reduce the probability for bugs of this nature to appear in C# code, the compiler still insists you will initialize your local variables. Even though it's going to add the .locals init flag to the generated CIL code.
A more comprehensive explanation on this subject can be found here: Behind The .locals init Flag
It actually shouldn't. Your error should be on the second line, not the first, and should be because you used it before you initialized it.
The compiler is helping you here.
So don't initialize them as a habit. Instead let the compiler help you out!
The nice thing about this is that it will path check for you. If you have a switch statement with three cases where each sets the value, but you forget to set it in your "default", but use it afterwards, it will warn you that you missed a path.
If you initialize variables to = 0, you take that benefit away.
As Marc indicates, that's what the specification says. The reason this is a good thing is that there are some valid reasons to leave a member uninitialized rather than a local variable, whose lifetime is bounded by the method it is in. Mostly you'd only ever want this for performance reasons, if the variable is expensive to initialize, and should only be initialized under specific usage scenarios. For my part, I'd avoid uninitialized members until my back was truly against the wall, though!
For local variables, it is also much easier to detect whether all code paths are likely to lead to initialization, whereas there are no good heuristics to determine whether all code paths across the entire programme guarantee initialization before use. A completely correct answer is impossible in both cases, as all computer science students should know.

What is the value for local variables before the assignment?

I know that the default value for reference types is null and the default value for value types follow this table: http://msdn.microsoft.com/en-us/library/83fhsxwc.aspx.
I also know that in C#, instance fields are automatically initialized and local variables are not. I'm also aware that the compiler will force you to assign a local variable before you read it.
I'm curious about what is the value of a local variable before it's assigned. Is it set to the default value, even though the compiler wants you to explicitly assign a value, or is it just random bits?
It actually depends on an IL flag. The MS C# compiler currently always sets this flag, so the memory is actually set to zero. However, technically there is no reason for it to do so. Either way, it is an implementation detail: you cannot find the answer to this using just C#, since C# will not allow you to query the value (directly or indirectly) of a local that is not "definitely assigned" (but you can if you use ILGenerator or similar to create a method directly in IL).
The flag specifically is the init in .locals init (...)
Edit: clarification - the CLI specification requires that all verifiable methods have .locals init, not just .locals: so without this, the code would not be verifiable, even if it was correct.

About unassigned variables

Just curious, I'm not trying to solve any problem.
Why only local variables should be assigned?
In the following example:
class Program
{
static int a;
static int b { get; set; }
static void Main(string[] args)
{
int c;
System.Console.WriteLine(a);
System.Console.WriteLine(b);
System.Console.WriteLine(c);
}
}
Why a and b gives me just a warning and c gives me an error?
Addionally, why I can't just use the default value of Value Type and write the following code?
bool MyCondition = true;
int c;
if (MyCondition)
c = 10;
Does it have anything to do with memory management?
Tim gives a good answer to your first question but I can add a few more details.
Your first question is basically "locals are required to be definitely assigned, so why not make the same restriction on fields?" Tim points out that Jon points out that it is actually quite difficult to do so. With locals it is almost always crystal clear when a local is first read and when it is first written. In the cases where it is not clear, the compiler can make reasonable conservative guesses. But with fields, to know when a first read and a first write happens, you have to know all kinds of things about which methods are called in what order.
In short, analyzing locals requires local analysis; the analysis doesn't have to go past the block that contains the declaration. Field analysis requires global analysis. That's a lot of work; it's easier to just say that fields are initialized to their default values.
(Now, that said, it is of course possible to do this global analysis; my new job will likely involve doing precisely this sort of analysis.)
Your second question is basically "Well then, if automatic assignment of default values is good enough for fields then why isn't it good enough for locals?" and the answer is "because failing to assign a local variable and accidentally getting the default value is a common source of bugs." C# was carefully designed to discourage programming practices that lead to irritating bugs, and this is one of them.
Because other variables are initialized with their default value.
Jon Skeet has already found some interesting words on this issue:
For local variables, the compiler has a good idea of the flow - it can
see a "read" of the variable and a "write" of the variable, and prove
(in most cases) that the first write will happen before the first
read.
This isn't the case with instance variables. Consider a simple
property - how do you know if someone will set it before they get it?
That makes it basically infeasible to enforce sensible rules - so
either you'd have to ensure that all fields were set in the
constructor, or allow them to have default values. The C# team chose
the latter strategy.
and here's the related C# language specification:
5.3 Definite assignment
At a given location in the executable code of a function member, a
variable is said to be definitely assigned if the compiler can prove,
by a particular static flow analysis (§5.3.3), that the variable has
been automatically initialized or has been the target of at least one
assignment.
5.3.1 Initially assigned variables
The following categories of variables are classified as initially
assigned:
Static variables.
Instance variables of class instances.
Instance variables of initially assigned struct variables.
Array elements.
Value parameters.
Reference parameters.
Variables declared in a catch clause or a foreach statement.
5.3.2 Initially unassigned variables
The following categories of variables are classified as initially
unassigned:
Instance variables of initially unassigned struct variables.
Output parameters, including the this variable of struct instance
constructors.
Local variables, except those declared in a catch clause or a
foreach statement.
The CLR provides a hard guarantee that local variables are initialized to their default value. But this guarantee does have limitations. What is missing is its ability to recognize scope blocks inside the method body. They disappear once the compiler translates the code to IL. Scope is a language construct that doesn't have a parallel in the CLI and cannot be expressed in IL.
You can see this going wrong in a language like VB.NET for example. This contrived example shows the behavior:
Module Module1
Sub Main()
For ix = 1 To 3
Dim s As String
If ix = 2 Then s = "foo"
If s Is Nothing Then Console.WriteLine("null") Else Console.WriteLine(s)
Next
Console.ReadLine()
End Sub
End Module
Output:
null
foo
foo
Or in other words, the local variable s was initialized only once and retains its value afterwards. This has a knack for creating bugs of course. The VB.NET compiler does generate a warning for it and has simple syntax to avoid it (As New). A managed language like C++/CLI has the same behavior but doesn't emit a diagnostic at all. But the C# language gives a stronger guarantee, it generates an error.
This rule is called "definite assignment". The exact rules are explained in detail in the C# Language Specification, chapter 5.3.3
Definite assignment checking has its limitations. It works well for local variables since their scope is very limited (private to the method body) and you can't get to them with Reflection. Much harder to do with fields of a class, it requires whole-program analysis that may need to reach beyond what the compiler can see. Like code in another assembly. Which is why the C# compiler can only warn about it but can't reject the code out-right.

can a compiler assign a value to a variable even before the variable is actually initiated?

I've just read http://www.javaworld.com/javaworld/jw-04-2003/jw-0425-designpatterns.html?page=5 and it says:
the compiler is free to assign a value to the singleton member variable
before the singleton's constructor is
called
I'm wondering if it is a typo. Do they actually really wanted to say: the implementation of the JVM is free to instead of the compiler is free to.
and my second question is that do we have this issue with C#/VB as well? (in which the ""compiler"" is free to assign a value to a variable even before the variable is fully initiated/even before the constructor function of the class of the variable is fully ran.
In Java, allocating the memory for an object and calling the constructor are two separate operations. For example, something like
Object o = new Object();
compiles into these bytecodes:
0: new #2; //class java/lang/Object
3: dup
4: invokespecial #1; //Method java/lang/Object."<init>":()V
7: astore_1
After instruction 0, a reference to an allocated but unconstructed object is on the stack. The constructor isn't called until offset 4. There is definitely nothing keeping the compiler from assigning that reference to any variable it wants to, including a static member. The article is therefore correct.
I don't know CLR bytecode, but I imagine it cleaves rather closely to the JVM's instruction set, and I'd guess the same sort of thread-related caveat would exist for that runtime as well. It certainly holds for native code compilers.
The answer the first part of the question is that you are correct, though this is more a case of sloppy terminology than a typo or a mistake. (Plainly, the compiler doesn't assign values to variables ... this only happens when the code generated by the compiler is executed.)
A more technically accurate restatement would be:
"... because the compiler is free to generate code that may cause a value to be written to memory for the singleton member variable before the singleton's constructor has been called, and the constructed object has been flushed to memory."
This kind of thing is most likely to happen at the native code compiler level, when the compiler (legally) reorders instructions, or simply as a result of memory pipelining. The Java memory model specifically allow this so that the compiler is able to generate code that runs faster on multi-core machines. (The flip-side is that your multi-threaded code has to synchronize in the required way, or else it is liable to be unreliable.)
(It is also theoretically possible for the bytecode compiler to reorder bytecodes, but the chances are that it won't. There is little value in the bytecode compiler doing fine-grained optimization. Indeed, it can be harmful since it potentially makes it harder for the JIT compiler's optimizer.)
I'll leave the C# and VB cases to someone who is familiar with the specifications of those languages.

Why are static fields null by default?

Just curious, from the code below I can see that static field of type A will be null by default while variable of that type needs to be initialized to ahve at least null value. Could anyone explain the difference a bit more? Thanks
class Program
{
static A _a; //it is null by default
static void Main(string[] args)
{
A nonStaticA; //empty reference, exception when used
A correctA=null;
}
}
class A
{
}
The initial value of a field, whether it be a static field or an instance field, is the default value of the field's type. It is not possible to observe the value of a field before this default initialization has occurred, and a field is thus never "uninitialized".
If a static constructor exists in the class, execution of the static field initializers occurs immediately prior to executing that static constructor. Otherwise, the static field initializers are executed at an implementation-dependent time prior to the first use of a static field of that class.
A local variable is not automatically initialized and thus has no default value. For the purpose of definite assignment checking, a local variable is considered initially unassigned.
The difference is between local variables and fields, not between static and instance.
Local variables of any type are required to be initialized to a value before they are used the first time. This prevents bugs where you forget to initialize a variable.
The compiler can verify this, because local variables only exist inside a single method, and code in a method is executed in a predictable order, from the top town. So it is easy to check if a variable is accessed before it is initialized.
Fields are different. They can be accessed in multiple methods, and there is no way the compiler can determine in which order they are executed. Therefore it cannot check at compile-time that fields are initialized before they are accessed. Instead fields are given a default value, which is null for all reference types, 0 for integers and so on.
It has nothing to do with static. Class fields (instance and static) are initialized to their defaults, local variables are not.
And why? Like lots of things it was a design decision at some point.
C#, as far as I can observes, kept many things as they were in other previous languages, mainly C++.
The reason in C++ (which may or may not be relevant to C#) is that static (or global) objects are statically written into the executable or library while for other objects the code that creates the object (and not the object itself) is written into the executable or library. For objects on the stack usually code that subtracts some value from the stack pointer is written.
When the executable or library is loaded into memory by the OS the static fields are just a bunch of bytes that are copied as is into memory (the data segment of the process). Since they are copied as is, they already have a value (the value in the executable or library file). Because of that there is no performance implications to setting it to a specific value. For that reason (as far as I can see) the C++ standard made their value deterministic (if they are not initialized explicitly) and what is more natural than zero as an initialization value?!
In order to initialize a dynamic object (whether it is on the stack or the heap) code has to be inserted into the executable or library. This has performance implications (and maybe other implications) so the C++ standard preferred to leave it up to the programmer.
I'm not completely sure every bit of this data is true, but it is what seems logical to me from what I know.

Categories

Resources