Is CS0165 C# compiler error guaranteed for unassigned local variables?

Is CS0165 C# compiler error guaranteed for unassigned local variables? - c#

In code like this:
int val;
if (something())
val = 10;
val++; // Error CS0165 Use of unassigned local variable
I get CS0165 error message when a local variable is used when there's a chance it was not surely initialized before.
In C++ world this scenario is undefined behavior class situation, which means that anything is allowed. So maybe there's a compiler error, maybe there's a compiler warning, maybe there's a runtime error, maybe code just uses whatever was in memory at that moment and good luck noticing it.
Is CS0165 guaranteed for such code in C#?
Is there a case where this specific piece of code does not yield an error message?

Is CS0165 guaranteed for such code in C#?
Yes, the rules of definite assignment are designed so that a local variable can never be read before it's definitely been written.
It's pretty conservative, too - for example:
bool condition = false;
int x;
if (condition)
{
x = 0;
}
if (!condition)
{
x = 1;
}
Console.WriteLine(x); // Error
Here even though we know that exactly one of those if statement bodies will be entered, the compiler doesn't - so x isn't definitely assigned at the end.
The rules of definite assignment are in the C# 5 specification in section 5.3.
Note that various classes of variables (e.g. static fields, and instance fields of classes) are deemed "initially assigned" - but they have well-specified default values, so there's still no undefined behaviour there.

Related

Use of unassigned local int variable [duplicate]

It seems like there is no way to have unassigned local variables in your code or check for them, as the compiler spits out Use of unassigned local variable error.
Why is the compiler not using default(T) on these variables at compile time?
Even if it's harder to do for value types, references types could easily be initialized to null in this case, right?
Here is some test code:
public void Test ( )
{
int x;
string s;
if ( x == 5 )
Console.WriteLine ( 5 );
if ( s != null )
Console.WriteLine ( "s" );
}
which returns:
Use of unassigned local variable 'x'
Use of unassigned local variable 's'
UPDATE:
For people who claims this is not allowed for a good reason, why is it allowed on a class level?
public class C
{
public int X;
public string S;
public void Print ( )
{
Console.WriteLine ( X );
Console.WriteLine ( S );
}
}
This code compiles perfectly fine.
Why is it fine to have on a class level but not on a method level?

I see you've updated your question, so I'll update my answer. Your question has two parts, one relating to local variables and the other to instance variables on a class instance. First, however, this isn't really a compiler design decision, but rather a language design decision.
Spec Section 12.3.1/12.3.2
Local Variables
We know why you can define a variable without giving it a value. One reason, an example of something like this:
int x;
// do stuff
x = 5; // Wow, I can initialize it later!
Console.WriteLine(x);
The standard defines why this is valid code. Now, I'm not on the C# design team, but it makes good sense why they wouldn't automatically initialize the code for you (besides the performance hit when you actually didn't want it to be automatically initialized).
Say the code above was your intention, but you forgot to initialize x = 5;. If the compiler had automatically initialized the variable for you, the code would compile, but it would do nothing like you would expect.
Granted this is a trivial example, but this was a very good design decision from the language designers, as it would save many headaches trying to figure out why something wasn't working as expected.
As a side note, I can't think of a reason why you would want to define the code without assigned something to it, or use the default value (in every case), to me that would likely be a bug, which I'm sure is what the compiler designers may have determined.
Class Instance Variables
Class level members are defined by the standard to be initially assigned. In fact, to be fair, local variables outside those declared in a catch, foreach or using statement are initially unassigned. So really, this is a standards issue, not a compiler issue.
If I were to try and guess why this is the case with regards to instance variables of class instances, I would say it has to do with how the memory is allocated on the heap, since that is where the classes are allocated. When a class is allocated on the heap, all of its members have to be initialized and allocated on the heap with it. It's not just ok to do it in a class member than a local variable, it has to be done this way. They simply cannot be left unassigned.

C# is a "pit of success" language.
This is a design decision, as the language is completely capable of allowing you to use locals that have not been explicitly assigned. However, it is normally true that usage of such variables is erroneous, that a code path has not set a value for some reason. To avoid such errors, the compiler requires that all locals be assigned before being used.

1 Why does the compiler not allow the use of uninitialized variables?
Because preventing that promotes good programming.
2 Why does the compiler allow the use of uninitialized class members?
Because it's not possible to track this with any accuracy.

By taking your suggestion of initializing reference types to null, instead of the current behavior (buggy code causes a compile time error), you'll instead get a runtime error when you dereference an uninitialized variable. Is that really what you want?

Consider the following code:
void blah(IDictionary<int,int> dict)
{
for (int i=0; i<10; i++)
{
if ((i & 11) != 0)
{
int j;
dict.TryGetValue(i, out j);
System.Diagnostics.Debug.Print("{0}",j);
j++;
}
}
}
Suppose that the TryGetValue method of the passed-in implementation of IDictionary<int,int> never actually writes to j [not possible if it's written in C#, but possible if it's written in another language]. What should one expect the code to print?
Nothing in the C# standard requires that j be maintained when code leaves the if statement, but nothing requires that it be reset to zero between loop iterations. Mandating either course of action would impose additional costs in some cases. Rather than doing either, the Standard simply allows that when TryGetValue is called, j may arbitrarily hold zero or the last value it held when it was in scope. Such an approach avoids unnecessary costs, but would be awkward if code were allowed to see the value of j between the time it re-enters scope and the time it is written (the fact that passing an uninitialized variable as an out parameter to code written in another language will expose its value was likely unintentional).

Because what do you want in there? you want x to be zero by default and I want it to be 5...
if they assign 0 to int(s) and all the world starts assuming so, then they will change to -1 at some point and this will break so many applications around the globe.
in VB6 variables were assigned to something by default I think, and it was not as good as it seemed to be.
when you use C#, or C++, you assign the value with what you want, not the compiler for you.

About unassigned variables

Just curious, I'm not trying to solve any problem.
Why only local variables should be assigned?
In the following example:
class Program
{
static int a;
static int b { get; set; }
static void Main(string[] args)
{
int c;
System.Console.WriteLine(a);
System.Console.WriteLine(b);
System.Console.WriteLine(c);
}
}
Why a and b gives me just a warning and c gives me an error?
Addionally, why I can't just use the default value of Value Type and write the following code?
bool MyCondition = true;
int c;
if (MyCondition)
c = 10;
Does it have anything to do with memory management?

Tim gives a good answer to your first question but I can add a few more details.
Your first question is basically "locals are required to be definitely assigned, so why not make the same restriction on fields?" Tim points out that Jon points out that it is actually quite difficult to do so. With locals it is almost always crystal clear when a local is first read and when it is first written. In the cases where it is not clear, the compiler can make reasonable conservative guesses. But with fields, to know when a first read and a first write happens, you have to know all kinds of things about which methods are called in what order.
In short, analyzing locals requires local analysis; the analysis doesn't have to go past the block that contains the declaration. Field analysis requires global analysis. That's a lot of work; it's easier to just say that fields are initialized to their default values.
(Now, that said, it is of course possible to do this global analysis; my new job will likely involve doing precisely this sort of analysis.)
Your second question is basically "Well then, if automatic assignment of default values is good enough for fields then why isn't it good enough for locals?" and the answer is "because failing to assign a local variable and accidentally getting the default value is a common source of bugs." C# was carefully designed to discourage programming practices that lead to irritating bugs, and this is one of them.

Because other variables are initialized with their default value.
Jon Skeet has already found some interesting words on this issue:
For local variables, the compiler has a good idea of the flow - it can
see a "read" of the variable and a "write" of the variable, and prove
(in most cases) that the first write will happen before the first
read.
This isn't the case with instance variables. Consider a simple
property - how do you know if someone will set it before they get it?
That makes it basically infeasible to enforce sensible rules - so
either you'd have to ensure that all fields were set in the
constructor, or allow them to have default values. The C# team chose
the latter strategy.
and here's the related C# language specification:
5.3 Definite assignment
At a given location in the executable code of a function member, a
variable is said to be definitely assigned if the compiler can prove,
by a particular static flow analysis (§5.3.3), that the variable has
been automatically initialized or has been the target of at least one
assignment.
5.3.1 Initially assigned variables
The following categories of variables are classified as initially
assigned:
Static variables.
Instance variables of class instances.
Instance variables of initially assigned struct variables.
Array elements.
Value parameters.
Reference parameters.
Variables declared in a catch clause or a foreach statement.
5.3.2 Initially unassigned variables
The following categories of variables are classified as initially
unassigned:
Instance variables of initially unassigned struct variables.
Output parameters, including the this variable of struct instance
constructors.
Local variables, except those declared in a catch clause or a
foreach statement.

The CLR provides a hard guarantee that local variables are initialized to their default value. But this guarantee does have limitations. What is missing is its ability to recognize scope blocks inside the method body. They disappear once the compiler translates the code to IL. Scope is a language construct that doesn't have a parallel in the CLI and cannot be expressed in IL.
You can see this going wrong in a language like VB.NET for example. This contrived example shows the behavior:
Module Module1
Sub Main()
For ix = 1 To 3
Dim s As String
If ix = 2 Then s = "foo"
If s Is Nothing Then Console.WriteLine("null") Else Console.WriteLine(s)
Next
Console.ReadLine()
End Sub
End Module
Output:
null
foo
foo
Or in other words, the local variable s was initialized only once and retains its value afterwards. This has a knack for creating bugs of course. The VB.NET compiler does generate a warning for it and has simple syntax to avoid it (As New). A managed language like C++/CLI has the same behavior but doesn't emit a diagnostic at all. But the C# language gives a stronger guarantee, it generates an error.
This rule is called "definite assignment". The exact rules are explained in detail in the C# Language Specification, chapter 5.3.3
Definite assignment checking has its limitations. It works well for local variables since their scope is very limited (private to the method body) and you can't get to them with Reflection. Much harder to do with fields of a class, it requires whole-program analysis that may need to reach beyond what the compiler can see. Like code in another assembly. Which is why the C# compiler can only warn about it but can't reject the code out-right.

Why aren't unassigned local variables automatically initialized?

It seems like there is no way to have unassigned local variables in your code or check for them, as the compiler spits out Use of unassigned local variable error.
Why is the compiler not using default(T) on these variables at compile time?
Even if it's harder to do for value types, references types could easily be initialized to null in this case, right?
Here is some test code:
public void Test ( )
{
int x;
string s;
if ( x == 5 )
Console.WriteLine ( 5 );
if ( s != null )
Console.WriteLine ( "s" );
}
which returns:
Use of unassigned local variable 'x'
Use of unassigned local variable 's'
UPDATE:
For people who claims this is not allowed for a good reason, why is it allowed on a class level?
public class C
{
public int X;
public string S;
public void Print ( )
{
Console.WriteLine ( X );
Console.WriteLine ( S );
}
}
This code compiles perfectly fine.
Why is it fine to have on a class level but not on a method level?

I see you've updated your question, so I'll update my answer. Your question has two parts, one relating to local variables and the other to instance variables on a class instance. First, however, this isn't really a compiler design decision, but rather a language design decision.
Spec Section 12.3.1/12.3.2
Local Variables
We know why you can define a variable without giving it a value. One reason, an example of something like this:
int x;
// do stuff
x = 5; // Wow, I can initialize it later!
Console.WriteLine(x);
The standard defines why this is valid code. Now, I'm not on the C# design team, but it makes good sense why they wouldn't automatically initialize the code for you (besides the performance hit when you actually didn't want it to be automatically initialized).
Say the code above was your intention, but you forgot to initialize x = 5;. If the compiler had automatically initialized the variable for you, the code would compile, but it would do nothing like you would expect.
Granted this is a trivial example, but this was a very good design decision from the language designers, as it would save many headaches trying to figure out why something wasn't working as expected.
As a side note, I can't think of a reason why you would want to define the code without assigned something to it, or use the default value (in every case), to me that would likely be a bug, which I'm sure is what the compiler designers may have determined.
Class Instance Variables
Class level members are defined by the standard to be initially assigned. In fact, to be fair, local variables outside those declared in a catch, foreach or using statement are initially unassigned. So really, this is a standards issue, not a compiler issue.
If I were to try and guess why this is the case with regards to instance variables of class instances, I would say it has to do with how the memory is allocated on the heap, since that is where the classes are allocated. When a class is allocated on the heap, all of its members have to be initialized and allocated on the heap with it. It's not just ok to do it in a class member than a local variable, it has to be done this way. They simply cannot be left unassigned.

C# is a "pit of success" language.
This is a design decision, as the language is completely capable of allowing you to use locals that have not been explicitly assigned. However, it is normally true that usage of such variables is erroneous, that a code path has not set a value for some reason. To avoid such errors, the compiler requires that all locals be assigned before being used.

1 Why does the compiler not allow the use of uninitialized variables?
Because preventing that promotes good programming.
2 Why does the compiler allow the use of uninitialized class members?
Because it's not possible to track this with any accuracy.

By taking your suggestion of initializing reference types to null, instead of the current behavior (buggy code causes a compile time error), you'll instead get a runtime error when you dereference an uninitialized variable. Is that really what you want?

Consider the following code:
void blah(IDictionary<int,int> dict)
{
for (int i=0; i<10; i++)
{
if ((i & 11) != 0)
{
int j;
dict.TryGetValue(i, out j);
System.Diagnostics.Debug.Print("{0}",j);
j++;
}
}
}
Suppose that the TryGetValue method of the passed-in implementation of IDictionary<int,int> never actually writes to j [not possible if it's written in C#, but possible if it's written in another language]. What should one expect the code to print?
Nothing in the C# standard requires that j be maintained when code leaves the if statement, but nothing requires that it be reset to zero between loop iterations. Mandating either course of action would impose additional costs in some cases. Rather than doing either, the Standard simply allows that when TryGetValue is called, j may arbitrarily hold zero or the last value it held when it was in scope. Such an approach avoids unnecessary costs, but would be awkward if code were allowed to see the value of j between the time it re-enters scope and the time it is written (the fact that passing an uninitialized variable as an out parameter to code written in another language will expose its value was likely unintentional).

Because what do you want in there? you want x to be zero by default and I want it to be 5...
if they assign 0 to int(s) and all the world starts assuming so, then they will change to -1 at some point and this will break so many applications around the globe.
in VB6 variables were assigned to something by default I think, and it was not as good as it seemed to be.
when you use C#, or C++, you assign the value with what you want, not the compiler for you.

Question on C# Variable Scope vs. Other Languages

First of all, let me say that I've never used C# before, and I don't know about it much.
I was studying for my "Programming Languages" exam with Sebesta's "Concepts of Programming Languages 9th ed" book. After I read the following excerpt from "Scope declaration order (on 246th page)", I got a little bit puzzled:
"...For example, in C99, C++, Java the scope of all local variables is from their declarations to the ends of the blocks in which those declarations appear. However, in C# the scope of any variable declared in a block is the whole block, regardless of the position of the declaration in the block, as long as it is not in a nested block. The same is true for methods. Note that C# still requires that all variables be declared before they are used. Therefore, although the scope of a variable extends from the declaration to the top of the block or subprograms in which that declaration appears, the variable still cannot be used above its declaration"
Why did designers of C# make such decision? Is there any specific reason/advantage for such an unusual decision?

This prevents you from doing something such as
void Blah()
{
for (int i = 0; i < 10; i++)
{
// do something
}
int i = 42;
}
The reason is that it introduces the possibility for subtle bugs if you have to move code around, for instance. If you need i before your loop, now your loop is broken.

One example of a benefit of reduced confusion is that if you have a nested block above the variable declaration, the variable declaration will be in effect and prevent the nested block from declaring a variable with the same name.

From the C# Spec
class A
{
int i = 0;
void F() {
i = 1; // Error, use precedes declaration
int i;
i = 2;
}
void G() {
int j = (j = 1); // Valid
}
void H() {
int a = 1, b = ++a; // Valid
}
}
The scoping rules for local variables are designed to guarantee that the meaning of a name used in an expression context is always the same within a block. If the scope of a local variable were to extend only from its declaration to the end of the block, then in the example above, the first assignment would assign to the instance variable and the second assignment would assign to the local variable, possibly leading to compile-time errors if the statements of the block were later to be rearranged.

It's not that strange. As far as variables go, it enforces unique naming better than Java/C++.

Eric Lippert's answer on this related question might be of help.
As Anthony Pegram said earlier, C# enforces this rule because there are cases where rearranging the code can cause subtle bugs, leading to confusion.

C# variable scoping: 'x' cannot be declared in this scope because it would give a different meaning to 'x'

if(true)
{
string var = "VAR";
}
string var = "New VAR!";
This will result in:
Error 1 A local variable named 'var'
cannot be declared in this scope
because it would give a different
meaning to 'var', which is already
used in a 'child' scope to denote
something else.
Nothing earth shattering really, but isn't this just plain wrong? A fellow developer and I were wondering if the first declaration should be in a different scope, thus the second declaration cannot interfere with the first declaration.
Why is C# unable to differentiate between the two scopes? Should the first IF scope not be completely separate from the rest of the method?
I cannot call var from outside the if, so the error message is wrong, because the first var has no relevance in the second scope.

The issue here is largely one of good practice and preventing against inadvertent mistakes. Admittedly, the C# compiler could theoretically be designed such that there is no conflict between scopes here. This would however be much effort for little gain, as I see it.
Consider that if the declaration of var in the parent scope were before the if statement, there would be an unresolvable naming conflict. The compiler simply does not differentiate between the following two cases. Analysis is done purely based on scope, and not order of declaration/use, as you seem to be expecting.
The theoretically acceptable (but still invalid as far as C# is concerned):
if(true)
{
string var = "VAR";
}
string var = "New VAR!";
and the unacceptable (since it would be hiding the parent variable):
string var = "New VAR!";
if(true)
{
string var = "VAR";
}
are both treated precisely the same in terms of variables and scopes.
Now, is there any actual reason in this secenario why you can't just give one of the variables a different name? I assume (hope) your actual variables aren't called var, so I don't really see this being a problem. If you're still intent on reusing the same variable name, just put them in sibling scopes:
if(true)
{
string var = "VAR";
}
{
string var = "New VAR!";
}
This however, while valid to the compiler, can lead to some amount of confusion when reading the code, so I recommend against it in almost any case.

isn't this just plain wrong?
No, this is not wrong at all. This is a correct implementation of section 7.5.2.1 of the C# specification, "Simple names, invariant meanings in blocks".
The specification states:
For each occurrence of a given
identifier as a simple-name in an
expression or declarator, within the
local variable declaration space
of that occurrence, every
other occurrence of the same
identifier as a simple-name in an
expression or declarator must refer to the same
entity. This rule ensures that the
meaning of a name is always the same
within a given block, switch block,
for-, foreach- or using-statement, or
anonymous function.
Why is C# unable to differentiate between the two scopes?
The question is nonsensical; obviously the compiler is able to differentiate between the two scopes. If the compiler were unable to differentiate between the two scopes then how could the error be produced? The error message says that there are two different scopes, and therefore the scopes have been differentiated!
Should the first IF scope not be completeley seperate from the rest of the method?
No, it should not. The scope (and local variable declaration space) defined by the block statement in the consequence of the conditional statement is lexically a part of the outer block which defines the body of the method. Therefore, rules about the contents of the outer block apply to the contents of the inner block.
I cannot call var from outside the if,
so the error message is wrong, because
the first var has no relevance in the
second scope.
This is completely wrong. It is specious to conclude that just because the local variable is no longer in scope, that the outer block does not contain an error. The error message is correct.
The error here has nothing to do with whether the scope of any variable overlaps the scope of any other variable; the only thing that is relevant here is that you have a block -- the outer block -- in which the same simple name is used to refer to two completely different things. C# requires that a simple name have one meaning throughout the block which first uses it.
For example:
class C
{
int x;
void M()
{
int x = 123;
}
}
That is perfectly legal; the scope of the outer x overlaps the scope of the inner x, but that is not an error. What is an error is:
class C
{
int x;
void M()
{
Console.WriteLine(x);
if (whatever)
{
int x = 123;
}
}
}
because now the simple name "x" means two different things inside the body of M -- it means "this.x" and the local variable "x". It is confusing to developers and code maintainers when the same simple name means two completely different things in the same block, so that is illegal.
We do allow parallel blocks to contain the same simple name used in two different ways; this is legal:
class C
{
int x;
void M()
{
if (whatever)
{
Console.WriteLine(x);
}
if (somethingelse)
{
int x = 123;
}
}
}
because now the only block that contains two inconsistent usages of x is the outer block, and that block does not directly contain any usage of "x", only indirectly.

This is valid in C++, but a source for many bugs and sleepless nights. I think the C# guys decided that it's better to throw a warning/error since it's, in the vast majority of cases, a bug rather than something the coder actually want.
Here's an interesting discussion on what parts of the specification this error comes from.
EDIT (some examples) -----
In C++, the following is valid (and it doesn't really matter if the outer declaration is before or after the inner scope, it will just be more interesting and bug-prone if it's before).
void foo(int a)
{
int count = 0;
for(int i = 0; i < a; ++i)
{
int count *= i;
}
return count;
}
Now imagine the function being a few lines longer and it might be easy to not spot the error. The compiler never complains (not it the old days, not sure about newer versions of C++), and the function always returns 0.
The behaivour is clearly a bug, so it would be good if a c++-lint program or the compiler points this out. If it's not a bug it is easy to work around it by just renaming the inner variable.
To add insult to injury I remember that GCC and VS6 had different opinions on where the counter variable in for loops belonged. One said it belonged to the outer scope and the other said it didn't. A bit annoying to work on cross-platform code. Let me give you yet another example to keep my line count up.
for(int i = 0; i < 1000; ++i)
{
if(array[i] > 100)
break;
}
printf("The first very large value in the array exists at %d\n", i);
This code worked in VS6 IIRC and not in GCC. Anyway, C# has cleaned up a few things, which is good.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.