Say I have something like this:
public IOrder SomeMethodOnAnOrderClass()
{
IOrder myOrder = null;
if (SomeOtherOrder != null)
{
myOrder = SomeOtherOrder.MethodThatCreatesACopy();
}
return myOrder;
}
Why did the makers of C# require the explicit set of myOrder to null?
Is there ever a case where you would want to leave it unassigned?
Does the setting to null have a cost associated with it? Such that you would not want to always have unassigned variables set to null? (Even if they are later set to something else.)
Or is it required to make sure you have "dotted all your i's and crossed all your t's"?
Or is there some other reason?
They do default to null or, more accurately, your objects default to the value returned by default(T), which is different for value types.
This is a feature. There are all sorts of bugs in the wild caused by programmers using uninitialized variables. Not all languages give you such well defined behavior for this sort of thing (you know who you are...).
Apparently you haven't experienced that yet. Be happy and accept that the compiler is helping you to write better code.
In Why are local variables definitely assigned in unreachable statements? (thanks, MiMo for the link) Eric Lippert says:
The reason why we want to make this illegal is not, as many people
believe, because the local variable is going to be initialized to
garbage and we want to protect you from garbage. We do in fact
automatically initialize locals to their default values. (Though the C
and C++ programming languages do not, and will cheerfully allow you to
read garbage from an uninitialized local.) Rather, it is because the
existence of such a code path is probably a bug, and we want to throw
you in the pit of quality; you should have to work hard to write that
bug.
As far as I understand this, if a local variable is not assigned a value, it does not mean, that the developer indeed wanted to get the default(T) while reading from it. It means (in the majority of cases) that the developer probably missed it and forgot to initialize it. That is rather a bug, then a situation when a developer consciously wants to init a local variable to default(T) with just declaring it.
Related
It seems like there is no way to have unassigned local variables in your code or check for them, as the compiler spits out Use of unassigned local variable error.
Why is the compiler not using default(T) on these variables at compile time?
Even if it's harder to do for value types, references types could easily be initialized to null in this case, right?
Here is some test code:
public void Test ( )
{
int x;
string s;
if ( x == 5 )
Console.WriteLine ( 5 );
if ( s != null )
Console.WriteLine ( "s" );
}
which returns:
Use of unassigned local variable 'x'
Use of unassigned local variable 's'
UPDATE:
For people who claims this is not allowed for a good reason, why is it allowed on a class level?
public class C
{
public int X;
public string S;
public void Print ( )
{
Console.WriteLine ( X );
Console.WriteLine ( S );
}
}
This code compiles perfectly fine.
Why is it fine to have on a class level but not on a method level?
I see you've updated your question, so I'll update my answer. Your question has two parts, one relating to local variables and the other to instance variables on a class instance. First, however, this isn't really a compiler design decision, but rather a language design decision.
Spec Section 12.3.1/12.3.2
Local Variables
We know why you can define a variable without giving it a value. One reason, an example of something like this:
int x;
// do stuff
x = 5; // Wow, I can initialize it later!
Console.WriteLine(x);
The standard defines why this is valid code. Now, I'm not on the C# design team, but it makes good sense why they wouldn't automatically initialize the code for you (besides the performance hit when you actually didn't want it to be automatically initialized).
Say the code above was your intention, but you forgot to initialize x = 5;. If the compiler had automatically initialized the variable for you, the code would compile, but it would do nothing like you would expect.
Granted this is a trivial example, but this was a very good design decision from the language designers, as it would save many headaches trying to figure out why something wasn't working as expected.
As a side note, I can't think of a reason why you would want to define the code without assigned something to it, or use the default value (in every case), to me that would likely be a bug, which I'm sure is what the compiler designers may have determined.
Class Instance Variables
Class level members are defined by the standard to be initially assigned. In fact, to be fair, local variables outside those declared in a catch, foreach or using statement are initially unassigned. So really, this is a standards issue, not a compiler issue.
If I were to try and guess why this is the case with regards to instance variables of class instances, I would say it has to do with how the memory is allocated on the heap, since that is where the classes are allocated. When a class is allocated on the heap, all of its members have to be initialized and allocated on the heap with it. It's not just ok to do it in a class member than a local variable, it has to be done this way. They simply cannot be left unassigned.
C# is a "pit of success" language.
This is a design decision, as the language is completely capable of allowing you to use locals that have not been explicitly assigned. However, it is normally true that usage of such variables is erroneous, that a code path has not set a value for some reason. To avoid such errors, the compiler requires that all locals be assigned before being used.
1 Why does the compiler not allow the use of uninitialized variables?
Because preventing that promotes good programming.
2 Why does the compiler allow the use of uninitialized class members?
Because it's not possible to track this with any accuracy.
By taking your suggestion of initializing reference types to null, instead of the current behavior (buggy code causes a compile time error), you'll instead get a runtime error when you dereference an uninitialized variable. Is that really what you want?
Consider the following code:
void blah(IDictionary<int,int> dict)
{
for (int i=0; i<10; i++)
{
if ((i & 11) != 0)
{
int j;
dict.TryGetValue(i, out j);
System.Diagnostics.Debug.Print("{0}",j);
j++;
}
}
}
Suppose that the TryGetValue method of the passed-in implementation of IDictionary<int,int> never actually writes to j [not possible if it's written in C#, but possible if it's written in another language]. What should one expect the code to print?
Nothing in the C# standard requires that j be maintained when code leaves the if statement, but nothing requires that it be reset to zero between loop iterations. Mandating either course of action would impose additional costs in some cases. Rather than doing either, the Standard simply allows that when TryGetValue is called, j may arbitrarily hold zero or the last value it held when it was in scope. Such an approach avoids unnecessary costs, but would be awkward if code were allowed to see the value of j between the time it re-enters scope and the time it is written (the fact that passing an uninitialized variable as an out parameter to code written in another language will expose its value was likely unintentional).
Because what do you want in there? you want x to be zero by default and I want it to be 5...
if they assign 0 to int(s) and all the world starts assuming so, then they will change to -1 at some point and this will break so many applications around the globe.
in VB6 variables were assigned to something by default I think, and it was not as good as it seemed to be.
when you use C#, or C++, you assign the value with what you want, not the compiler for you.
Why must I initialize variables inside methods?
int test1; // Not initialized, but OK
public int Foo()
{
int test2; // Not initialized
int test3 = test1; // OK
int test4 = test2; // An error
}
Fields are automatically initialized to the logical zero for the type; this is implicit. Variables must obey "definite assignment", so must be assigned before they can be read.
ECMA 334v4
§17.4.4 Field initialization
The initial value of a field, whether
it be a static field or an instance
field, is the default value (§12.2) of
the field’s type. It is not possible
to observe the value of a field before
this default initialization has
occurred, and a field is thus never
"uninitialized".
and
§12. Variables
...
A variable shall be definitely assigned (§12.3) before its
value can be obtained.
...
Extending Mark's answer, local variable initialization is also related to the verification process.
The CLI requires that in any verifiable code (that is, modules that didn't explicitly asked to skip the verification process using the SkipVerfication property from the SecurityPermission attribute), all local variables must be initialized before their usage. Failing to do so will result in a VerficationException being thrown.
More interestingly, is that the compiler automatically adds the .locals init flag on every method that uses local variables. This flag causes the JIT compiler to generate code that initializes all of the local variables to their default values. Meaning, even though you have already initialized them in your own code, the JIT will comply with the .locals init flag and generate the proper initialization code. This "duplicate initialization" doesn't effect performance since in configurations that allow optimizations, the JIT compiler will detect the duplication and effectively treat it as "dead code" (the auto-generated initialization routine will not appear in the generated assembler instructions).
According to Microsoft (also, backed up by Eric Lippert in response to a question on his blog), on most occasions, when programmers don't initialize their local variable, they don't do so because they rely on the underlying environment to initialize their variable to their default values, but only because they "forgot", thus, causing sometimes-illusive logical bugs.
So in order to reduce the probability for bugs of this nature to appear in C# code, the compiler still insists you will initialize your local variables. Even though it's going to add the .locals init flag to the generated CIL code.
A more comprehensive explanation on this subject can be found here: Behind The .locals init Flag
It actually shouldn't. Your error should be on the second line, not the first, and should be because you used it before you initialized it.
The compiler is helping you here.
So don't initialize them as a habit. Instead let the compiler help you out!
The nice thing about this is that it will path check for you. If you have a switch statement with three cases where each sets the value, but you forget to set it in your "default", but use it afterwards, it will warn you that you missed a path.
If you initialize variables to = 0, you take that benefit away.
As Marc indicates, that's what the specification says. The reason this is a good thing is that there are some valid reasons to leave a member uninitialized rather than a local variable, whose lifetime is bounded by the method it is in. Mostly you'd only ever want this for performance reasons, if the variable is expensive to initialize, and should only be initialized under specific usage scenarios. For my part, I'd avoid uninitialized members until my back was truly against the wall, though!
For local variables, it is also much easier to detect whether all code paths are likely to lead to initialization, whereas there are no good heuristics to determine whether all code paths across the entire programme guarantee initialization before use. A completely correct answer is impossible in both cases, as all computer science students should know.
Just curious, I'm not trying to solve any problem.
Why only local variables should be assigned?
In the following example:
class Program
{
static int a;
static int b { get; set; }
static void Main(string[] args)
{
int c;
System.Console.WriteLine(a);
System.Console.WriteLine(b);
System.Console.WriteLine(c);
}
}
Why a and b gives me just a warning and c gives me an error?
Addionally, why I can't just use the default value of Value Type and write the following code?
bool MyCondition = true;
int c;
if (MyCondition)
c = 10;
Does it have anything to do with memory management?
Tim gives a good answer to your first question but I can add a few more details.
Your first question is basically "locals are required to be definitely assigned, so why not make the same restriction on fields?" Tim points out that Jon points out that it is actually quite difficult to do so. With locals it is almost always crystal clear when a local is first read and when it is first written. In the cases where it is not clear, the compiler can make reasonable conservative guesses. But with fields, to know when a first read and a first write happens, you have to know all kinds of things about which methods are called in what order.
In short, analyzing locals requires local analysis; the analysis doesn't have to go past the block that contains the declaration. Field analysis requires global analysis. That's a lot of work; it's easier to just say that fields are initialized to their default values.
(Now, that said, it is of course possible to do this global analysis; my new job will likely involve doing precisely this sort of analysis.)
Your second question is basically "Well then, if automatic assignment of default values is good enough for fields then why isn't it good enough for locals?" and the answer is "because failing to assign a local variable and accidentally getting the default value is a common source of bugs." C# was carefully designed to discourage programming practices that lead to irritating bugs, and this is one of them.
Because other variables are initialized with their default value.
Jon Skeet has already found some interesting words on this issue:
For local variables, the compiler has a good idea of the flow - it can
see a "read" of the variable and a "write" of the variable, and prove
(in most cases) that the first write will happen before the first
read.
This isn't the case with instance variables. Consider a simple
property - how do you know if someone will set it before they get it?
That makes it basically infeasible to enforce sensible rules - so
either you'd have to ensure that all fields were set in the
constructor, or allow them to have default values. The C# team chose
the latter strategy.
and here's the related C# language specification:
5.3 Definite assignment
At a given location in the executable code of a function member, a
variable is said to be definitely assigned if the compiler can prove,
by a particular static flow analysis (§5.3.3), that the variable has
been automatically initialized or has been the target of at least one
assignment.
5.3.1 Initially assigned variables
The following categories of variables are classified as initially
assigned:
Static variables.
Instance variables of class instances.
Instance variables of initially assigned struct variables.
Array elements.
Value parameters.
Reference parameters.
Variables declared in a catch clause or a foreach statement.
5.3.2 Initially unassigned variables
The following categories of variables are classified as initially
unassigned:
Instance variables of initially unassigned struct variables.
Output parameters, including the this variable of struct instance
constructors.
Local variables, except those declared in a catch clause or a
foreach statement.
The CLR provides a hard guarantee that local variables are initialized to their default value. But this guarantee does have limitations. What is missing is its ability to recognize scope blocks inside the method body. They disappear once the compiler translates the code to IL. Scope is a language construct that doesn't have a parallel in the CLI and cannot be expressed in IL.
You can see this going wrong in a language like VB.NET for example. This contrived example shows the behavior:
Module Module1
Sub Main()
For ix = 1 To 3
Dim s As String
If ix = 2 Then s = "foo"
If s Is Nothing Then Console.WriteLine("null") Else Console.WriteLine(s)
Next
Console.ReadLine()
End Sub
End Module
Output:
null
foo
foo
Or in other words, the local variable s was initialized only once and retains its value afterwards. This has a knack for creating bugs of course. The VB.NET compiler does generate a warning for it and has simple syntax to avoid it (As New). A managed language like C++/CLI has the same behavior but doesn't emit a diagnostic at all. But the C# language gives a stronger guarantee, it generates an error.
This rule is called "definite assignment". The exact rules are explained in detail in the C# Language Specification, chapter 5.3.3
Definite assignment checking has its limitations. It works well for local variables since their scope is very limited (private to the method body) and you can't get to them with Reflection. Much harder to do with fields of a class, it requires whole-program analysis that may need to reach beyond what the compiler can see. Like code in another assembly. Which is why the C# compiler can only warn about it but can't reject the code out-right.
It seems like there is no way to have unassigned local variables in your code or check for them, as the compiler spits out Use of unassigned local variable error.
Why is the compiler not using default(T) on these variables at compile time?
Even if it's harder to do for value types, references types could easily be initialized to null in this case, right?
Here is some test code:
public void Test ( )
{
int x;
string s;
if ( x == 5 )
Console.WriteLine ( 5 );
if ( s != null )
Console.WriteLine ( "s" );
}
which returns:
Use of unassigned local variable 'x'
Use of unassigned local variable 's'
UPDATE:
For people who claims this is not allowed for a good reason, why is it allowed on a class level?
public class C
{
public int X;
public string S;
public void Print ( )
{
Console.WriteLine ( X );
Console.WriteLine ( S );
}
}
This code compiles perfectly fine.
Why is it fine to have on a class level but not on a method level?
I see you've updated your question, so I'll update my answer. Your question has two parts, one relating to local variables and the other to instance variables on a class instance. First, however, this isn't really a compiler design decision, but rather a language design decision.
Spec Section 12.3.1/12.3.2
Local Variables
We know why you can define a variable without giving it a value. One reason, an example of something like this:
int x;
// do stuff
x = 5; // Wow, I can initialize it later!
Console.WriteLine(x);
The standard defines why this is valid code. Now, I'm not on the C# design team, but it makes good sense why they wouldn't automatically initialize the code for you (besides the performance hit when you actually didn't want it to be automatically initialized).
Say the code above was your intention, but you forgot to initialize x = 5;. If the compiler had automatically initialized the variable for you, the code would compile, but it would do nothing like you would expect.
Granted this is a trivial example, but this was a very good design decision from the language designers, as it would save many headaches trying to figure out why something wasn't working as expected.
As a side note, I can't think of a reason why you would want to define the code without assigned something to it, or use the default value (in every case), to me that would likely be a bug, which I'm sure is what the compiler designers may have determined.
Class Instance Variables
Class level members are defined by the standard to be initially assigned. In fact, to be fair, local variables outside those declared in a catch, foreach or using statement are initially unassigned. So really, this is a standards issue, not a compiler issue.
If I were to try and guess why this is the case with regards to instance variables of class instances, I would say it has to do with how the memory is allocated on the heap, since that is where the classes are allocated. When a class is allocated on the heap, all of its members have to be initialized and allocated on the heap with it. It's not just ok to do it in a class member than a local variable, it has to be done this way. They simply cannot be left unassigned.
C# is a "pit of success" language.
This is a design decision, as the language is completely capable of allowing you to use locals that have not been explicitly assigned. However, it is normally true that usage of such variables is erroneous, that a code path has not set a value for some reason. To avoid such errors, the compiler requires that all locals be assigned before being used.
1 Why does the compiler not allow the use of uninitialized variables?
Because preventing that promotes good programming.
2 Why does the compiler allow the use of uninitialized class members?
Because it's not possible to track this with any accuracy.
By taking your suggestion of initializing reference types to null, instead of the current behavior (buggy code causes a compile time error), you'll instead get a runtime error when you dereference an uninitialized variable. Is that really what you want?
Consider the following code:
void blah(IDictionary<int,int> dict)
{
for (int i=0; i<10; i++)
{
if ((i & 11) != 0)
{
int j;
dict.TryGetValue(i, out j);
System.Diagnostics.Debug.Print("{0}",j);
j++;
}
}
}
Suppose that the TryGetValue method of the passed-in implementation of IDictionary<int,int> never actually writes to j [not possible if it's written in C#, but possible if it's written in another language]. What should one expect the code to print?
Nothing in the C# standard requires that j be maintained when code leaves the if statement, but nothing requires that it be reset to zero between loop iterations. Mandating either course of action would impose additional costs in some cases. Rather than doing either, the Standard simply allows that when TryGetValue is called, j may arbitrarily hold zero or the last value it held when it was in scope. Such an approach avoids unnecessary costs, but would be awkward if code were allowed to see the value of j between the time it re-enters scope and the time it is written (the fact that passing an uninitialized variable as an out parameter to code written in another language will expose its value was likely unintentional).
Because what do you want in there? you want x to be zero by default and I want it to be 5...
if they assign 0 to int(s) and all the world starts assuming so, then they will change to -1 at some point and this will break so many applications around the globe.
in VB6 variables were assigned to something by default I think, and it was not as good as it seemed to be.
when you use C#, or C++, you assign the value with what you want, not the compiler for you.
if variable is not assigned, then it takes the default value at run time.
for example
int A1;
if i will check the value of A1 at runtime it will be 0;
then why at compile time it throws a error of unassigned value;
why CLR don't use to a lot the default value at runtime;
int A1;
int B1 = A1+10;
it should be 11 as the default value of A1 is 0;
there project property where i can check for "assign default values for unassigned variable";
Can anybody tell me where i can find it?
Your statement
if variable is not assigned,then it takes the default value at run time
is only true for member variables in a class.
For local variables inside a function, this is wrong. Local variables inside a function always require initialization.
it should be 11 as the default value of A1 is 0;
This is exactly the reason that the C# compiler won't let you get away with using uninitialized variables. The result would be 10, not 11. After a good 30 years of experience with C and C++, languages that allow you to use uninitialized variables, the C# team decided that this was a major source of bugs and to not allow this in a C# program.
There are lots of little tweaks like this. Another great example is not allowing fall through to another case in a switch statement. Forgetting to write break is such a classic bug. Outlawing these C-isms is rather an excellent idea and a big part of why C# is such a great language. Unless you dislike the idea of a compiler as a police officer.
Fwiw: using an uninitialized variable is permitted in VB.NET.
Default value is true for class members, but not for function locals. Whatever code you put directly into an as[pc]x file, the code generator will put it into a function.
Most of the time this happens is because the variable is an Object and before you can use it you need to Instantiate it.
When you assign a String = "" this is instantiated for you
You are talking about local variables or class level variables ? The rules are different for both. Check our Jon Skeet's reply at this:
Initialization of instance fields vs. local variables
The heap (reference classes) and the constructor for structs zero's the data.
Simple value-types like int, but also references (=pointers) to objects, do not get a default value on the stack. You should always set it. If this was not mandatory, specially with object-pointers, this could be a major security breach, because you are pointing to unknown locations.
Any default value (like 0) would probably be wrong 50% of the time.