Why is the scope of if and delegates this way in c# - c#

Inspired by this question I began wondering why the following examples are all illegal in c#:
VoidFunction t = delegate { int i = 0; };
int i = 1;
and
{
int i = 0;
}
int i = 1;
I'm just wondering if anyone knew the exact reason why the language was designed this way? Is it to discourage bad programming practice, and if so why not just issue a warning?, for performance reasons (compiling and when running) or what is the reason?

This behavior is covered in section 3 of the C# language specification. Here is the quote from the spec
Similarly, any expression that
occurs as the body of an anonymous
function in the form of a
lambda-expression creates a
declaration space which contains the
parameters of the anonymous function.
It is an error for two members of a
local variable declaration space to
have the same name. It is an error for
the local variable declaration space
of a block and a nested local variable
declaration space to contain elements
with the same name. Thus, within a
nested declaration space it is not
possible to declare a local variable
or constant with the same name as a
local variable or constant in an
enclosing declaration space.
I think the easier way to read this is that for the purpose of variable declaration (and many other block related functions) a lambda/anonymous delegate block are treated no different than a normal block.
As to why the the language was designed this way the spec does not explicitly state. My opinion though is simplicity. If the code is treated as just another block then it makes code analysis routines easier. You can preserve all of your existing routines to analyze the block for semantical errors and name resolution. This is particularly important when you consider variable lifting. Lambdas will eventually be a different function but they still have access to all in scope variables at the declaration point.

I think it is done this way so that the inner scope can access variables declared in the outer scope. If you were allowed to over-write variables existing in the outer scope, there may be confusion about what behavior was intended. So they may have decided to resolve the issue by preventing it from happening.

I think it's this way to prevent devs from shooting themselves in the foot.

Related

Variable in child scope preventing definition of variable in parent [duplicate]

if(true)
{
string var = "VAR";
}
string var = "New VAR!";
This will result in:
Error 1 A local variable named 'var'
cannot be declared in this scope
because it would give a different
meaning to 'var', which is already
used in a 'child' scope to denote
something else.
Nothing earth shattering really, but isn't this just plain wrong? A fellow developer and I were wondering if the first declaration should be in a different scope, thus the second declaration cannot interfere with the first declaration.
Why is C# unable to differentiate between the two scopes? Should the first IF scope not be completely separate from the rest of the method?
I cannot call var from outside the if, so the error message is wrong, because the first var has no relevance in the second scope.
The issue here is largely one of good practice and preventing against inadvertent mistakes. Admittedly, the C# compiler could theoretically be designed such that there is no conflict between scopes here. This would however be much effort for little gain, as I see it.
Consider that if the declaration of var in the parent scope were before the if statement, there would be an unresolvable naming conflict. The compiler simply does not differentiate between the following two cases. Analysis is done purely based on scope, and not order of declaration/use, as you seem to be expecting.
The theoretically acceptable (but still invalid as far as C# is concerned):
if(true)
{
string var = "VAR";
}
string var = "New VAR!";
and the unacceptable (since it would be hiding the parent variable):
string var = "New VAR!";
if(true)
{
string var = "VAR";
}
are both treated precisely the same in terms of variables and scopes.
Now, is there any actual reason in this secenario why you can't just give one of the variables a different name? I assume (hope) your actual variables aren't called var, so I don't really see this being a problem. If you're still intent on reusing the same variable name, just put them in sibling scopes:
if(true)
{
string var = "VAR";
}
{
string var = "New VAR!";
}
This however, while valid to the compiler, can lead to some amount of confusion when reading the code, so I recommend against it in almost any case.
isn't this just plain wrong?
No, this is not wrong at all. This is a correct implementation of section 7.5.2.1 of the C# specification, "Simple names, invariant meanings in blocks".
The specification states:
For each occurrence of a given
identifier as a simple-name in an
expression or declarator, within the
local variable declaration space
of that occurrence, every
other occurrence of the same
identifier as a simple-name in an
expression or declarator must refer to the same
entity. This rule ensures that the
meaning of a name is always the same
within a given block, switch block,
for-, foreach- or using-statement, or
anonymous function.
Why is C# unable to differentiate between the two scopes?
The question is nonsensical; obviously the compiler is able to differentiate between the two scopes. If the compiler were unable to differentiate between the two scopes then how could the error be produced? The error message says that there are two different scopes, and therefore the scopes have been differentiated!
Should the first IF scope not be completeley seperate from the rest of the method?
No, it should not. The scope (and local variable declaration space) defined by the block statement in the consequence of the conditional statement is lexically a part of the outer block which defines the body of the method. Therefore, rules about the contents of the outer block apply to the contents of the inner block.
I cannot call var from outside the if,
so the error message is wrong, because
the first var has no relevance in the
second scope.
This is completely wrong. It is specious to conclude that just because the local variable is no longer in scope, that the outer block does not contain an error. The error message is correct.
The error here has nothing to do with whether the scope of any variable overlaps the scope of any other variable; the only thing that is relevant here is that you have a block -- the outer block -- in which the same simple name is used to refer to two completely different things. C# requires that a simple name have one meaning throughout the block which first uses it.
For example:
class C
{
int x;
void M()
{
int x = 123;
}
}
That is perfectly legal; the scope of the outer x overlaps the scope of the inner x, but that is not an error. What is an error is:
class C
{
int x;
void M()
{
Console.WriteLine(x);
if (whatever)
{
int x = 123;
}
}
}
because now the simple name "x" means two different things inside the body of M -- it means "this.x" and the local variable "x". It is confusing to developers and code maintainers when the same simple name means two completely different things in the same block, so that is illegal.
We do allow parallel blocks to contain the same simple name used in two different ways; this is legal:
class C
{
int x;
void M()
{
if (whatever)
{
Console.WriteLine(x);
}
if (somethingelse)
{
int x = 123;
}
}
}
because now the only block that contains two inconsistent usages of x is the outer block, and that block does not directly contain any usage of "x", only indirectly.
This is valid in C++, but a source for many bugs and sleepless nights. I think the C# guys decided that it's better to throw a warning/error since it's, in the vast majority of cases, a bug rather than something the coder actually want.
Here's an interesting discussion on what parts of the specification this error comes from.
EDIT (some examples) -----
In C++, the following is valid (and it doesn't really matter if the outer declaration is before or after the inner scope, it will just be more interesting and bug-prone if it's before).
void foo(int a)
{
int count = 0;
for(int i = 0; i < a; ++i)
{
int count *= i;
}
return count;
}
Now imagine the function being a few lines longer and it might be easy to not spot the error. The compiler never complains (not it the old days, not sure about newer versions of C++), and the function always returns 0.
The behaivour is clearly a bug, so it would be good if a c++-lint program or the compiler points this out. If it's not a bug it is easy to work around it by just renaming the inner variable.
To add insult to injury I remember that GCC and VS6 had different opinions on where the counter variable in for loops belonged. One said it belonged to the outer scope and the other said it didn't. A bit annoying to work on cross-platform code. Let me give you yet another example to keep my line count up.
for(int i = 0; i < 1000; ++i)
{
if(array[i] > 100)
break;
}
printf("The first very large value in the array exists at %d\n", i);
This code worked in VS6 IIRC and not in GCC. Anyway, C# has cleaned up a few things, which is good.

About unassigned variables

Just curious, I'm not trying to solve any problem.
Why only local variables should be assigned?
In the following example:
class Program
{
static int a;
static int b { get; set; }
static void Main(string[] args)
{
int c;
System.Console.WriteLine(a);
System.Console.WriteLine(b);
System.Console.WriteLine(c);
}
}
Why a and b gives me just a warning and c gives me an error?
Addionally, why I can't just use the default value of Value Type and write the following code?
bool MyCondition = true;
int c;
if (MyCondition)
c = 10;
Does it have anything to do with memory management?
Tim gives a good answer to your first question but I can add a few more details.
Your first question is basically "locals are required to be definitely assigned, so why not make the same restriction on fields?" Tim points out that Jon points out that it is actually quite difficult to do so. With locals it is almost always crystal clear when a local is first read and when it is first written. In the cases where it is not clear, the compiler can make reasonable conservative guesses. But with fields, to know when a first read and a first write happens, you have to know all kinds of things about which methods are called in what order.
In short, analyzing locals requires local analysis; the analysis doesn't have to go past the block that contains the declaration. Field analysis requires global analysis. That's a lot of work; it's easier to just say that fields are initialized to their default values.
(Now, that said, it is of course possible to do this global analysis; my new job will likely involve doing precisely this sort of analysis.)
Your second question is basically "Well then, if automatic assignment of default values is good enough for fields then why isn't it good enough for locals?" and the answer is "because failing to assign a local variable and accidentally getting the default value is a common source of bugs." C# was carefully designed to discourage programming practices that lead to irritating bugs, and this is one of them.
Because other variables are initialized with their default value.
Jon Skeet has already found some interesting words on this issue:
For local variables, the compiler has a good idea of the flow - it can
see a "read" of the variable and a "write" of the variable, and prove
(in most cases) that the first write will happen before the first
read.
This isn't the case with instance variables. Consider a simple
property - how do you know if someone will set it before they get it?
That makes it basically infeasible to enforce sensible rules - so
either you'd have to ensure that all fields were set in the
constructor, or allow them to have default values. The C# team chose
the latter strategy.
and here's the related C# language specification:
5.3 Definite assignment
At a given location in the executable code of a function member, a
variable is said to be definitely assigned if the compiler can prove,
by a particular static flow analysis (§5.3.3), that the variable has
been automatically initialized or has been the target of at least one
assignment.
5.3.1 Initially assigned variables
The following categories of variables are classified as initially
assigned:
Static variables.
Instance variables of class instances.
Instance variables of initially assigned struct variables.
Array elements.
Value parameters.
Reference parameters.
Variables declared in a catch clause or a foreach statement.
5.3.2 Initially unassigned variables
The following categories of variables are classified as initially
unassigned:
Instance variables of initially unassigned struct variables.
Output parameters, including the this variable of struct instance
constructors.
Local variables, except those declared in a catch clause or a
foreach statement.
The CLR provides a hard guarantee that local variables are initialized to their default value. But this guarantee does have limitations. What is missing is its ability to recognize scope blocks inside the method body. They disappear once the compiler translates the code to IL. Scope is a language construct that doesn't have a parallel in the CLI and cannot be expressed in IL.
You can see this going wrong in a language like VB.NET for example. This contrived example shows the behavior:
Module Module1
Sub Main()
For ix = 1 To 3
Dim s As String
If ix = 2 Then s = "foo"
If s Is Nothing Then Console.WriteLine("null") Else Console.WriteLine(s)
Next
Console.ReadLine()
End Sub
End Module
Output:
null
foo
foo
Or in other words, the local variable s was initialized only once and retains its value afterwards. This has a knack for creating bugs of course. The VB.NET compiler does generate a warning for it and has simple syntax to avoid it (As New). A managed language like C++/CLI has the same behavior but doesn't emit a diagnostic at all. But the C# language gives a stronger guarantee, it generates an error.
This rule is called "definite assignment". The exact rules are explained in detail in the C# Language Specification, chapter 5.3.3
Definite assignment checking has its limitations. It works well for local variables since their scope is very limited (private to the method body) and you can't get to them with Reflection. Much harder to do with fields of a class, it requires whole-program analysis that may need to reach beyond what the compiler can see. Like code in another assembly. Which is why the C# compiler can only warn about it but can't reject the code out-right.

C#: why scoped objects can't have same names? [duplicate]

Based on this recent question, I don't understand the answer provided. Seems like you should be able to do something like this, since their scopes do not overlap
static void Main()
{
{
int i;
}
int i;
}
This code fails to compile with the following error:
A local variable named 'i' cannot be declared in this scope because it would give a different meaning to 'i', which is already used in a 'child' scope to denote something else
I don't think any of the answers so far have quite got the crucial line from the spec.
From section 8.5.1:
The scope of a local variable declared in a local-variable-declaration is the block in which the declaration occurs. It is an error to refer to a local variable in a textual position that precedes the local-variable-declarator of the local variable. Within the scope of a local variable, it is a compile-time error to declare another local variable or constant with the same name.
(Emphasis mine.)
In other words, the scope for the "later" variable includes the part of the block before the declaration - i.e. it includes the "inner" block containing the "earlier" variable.
You can't refer to the later variable in a place earlier than its declaration - but it's still in scope.
"The scope of local or constant variable extends to the end of the current block. You cannot declare another local variable with the same name in the current block or in any nested blocks." C# 3.0 in a Nutshell, http://www.amazon.com/3-0-Nutshell-Desktop-Reference-OReilly/dp/0596527578/
"The local variable declaration space of a block includes any nested blocks. Thus, within a nested block it is not possible to declare a local variable with the same name as a local variable in an enclosing block." Variable Scopes, MSDN, http://msdn.microsoft.com/en-us/library/aa691107%28v=vs.71%29.aspx
On a side note, this is quite the opposite that of JavaScript and F# scoping rules.
From the C# language spec:
The local variable declaration space of a block includes any nested blocks. Thus, within a nested block it is not possible to declare a local variable with the same name as a local variable in an enclosing block.
Essentially, it's not allowed because, in C#, their scopes actually do overlap.
edit: Just to clarify, C#'s scope is resolved at the block level, not line-by-line. So while it's true that you cannot refer to a variable in code that comes before its declaration, it's also true that its scope extends all the way back to the beginning of the block.
This has been a rule in C# from the first version.
Allowing overlapping scopes would only lead to confusion (of the programmers, not the compiler).
So it has been forbidden on purpose.
For C#, ISO 23270 (Information technology — Programming
languages — C#), §10.3 (Declarations) says:
Each block, switch-block, for-statement, foreach-statement, or
using-statement creates a declaration space for local variables and
local constants called the local variable declaration space. Names are
introduced into this declaration space through local-variable-declarations
and local-constant declarations.
If a block is the body of an instance
constructor, method, or operator declaration, or a get or set accessor for
an indexer declaration, the parameters declared in such a declaration are
members of the block’s local variable declaration space.
If a block is the
body of a generic method, the type parameters declared in such a declaration
are members of the block’s local variable declaration space.
It is an error
for two members of a local variable declaration space to have the same name.
It is an error for a local variable declaration space and a nested local
variable declaration space to contain elements with the same name.
[Note: Thus, within a nested block it is not possible to declare a local
variable or constant with the same name as a local variable or constant
in an enclosing block. It is possible for two nested blocks to contain
elements with the same name as long as neither block contains the other.
end note]
So
public void foobar()
{
if ( foo() )
{
int i = 0 ;
...
}
if ( bar() )
{
int i = 0 ;
...
}
return ;
}
is legal, but
public void foobar()
{
int i = 0 ;
if ( foo() )
{
int i = 0 ;
...
}
...
return ;
}
is not legal. Personally, I find the restriction rather annoying. I can see issuing a compiler warning about scope overlap, but a compilation error? Too much belt-and-suspenders, IMHO. I could see the virtue of a compiler option and/or pragma , though ( perhaps -pedantic/-practical, #pragma pedantic vs #pragma practical, B^)).
It's not a question of overlapping scopes. In C# a simple name cannot mean more than one thing within a block where it's declared. In your example, the name i means two different things within the same outer block.
In other words, you should be able to move a variable declaration around to any place within the block where it was declared without causing scopes to overlap. Since changing your example to:
static void Main()
{
int i;
{
int i;
}
}
would cause the scopes of the different i variables to overlap, your example is illegal.
I just compiled this in GCC both as C and as C++. I received no error message so it appears to be valid syntax.
Your question is tagged as .net and as c. Should this be tagged as c#? That language might have different rules than C.
In C you need to put all variable declaration at the very beginning of a block. They need to come all directly after the opening { before any other statements in this block.
So what you can do to make it compile is this:
static void Main()
{
{
int i;
}
{
int i;
}
}
Here's your answer from MSDN .NET Documentation:
...The local variable declaration space of a block includes any nested blocks. Thus, within a nested block it is not possible to declare a local variable with the same name as a local variable in an enclosing block.

How do the compiler know it is out of scope?

Main()
{
int i =0;
...
...
while(true)
{
int k =0;
...
...
}
// K is out of scope..
}
How does the compiler know K is out of scope?
How does the compiler know [that a local variable] is out of scope?
First off, let's carefully define the terms you're using. The scope of a named entity is the region of program text in which it is legal to use the name of the entity without additional qualification of the name.
The scope of a local variable is defined by the specification as the region of program text which is the entire block that immediately contains the declaration.
The compiler determines the scope of the local variable by keeping track of a local declaration space associated with each syntactic block. When we need to resolve a name, we figure out what block the name usage is inside, and consult the related declaration space. Of course, blocks nest and so do local variable declaration spaces, so we might have to consult more than one, in order from inside-to-outside.
The actual data structures we use are straightforward hash tables optimized for fast lookup and filtering on various aspects needed by the compiler. (For example, we sometimes need to look up names but only want to get types, or only methods, and so on.)
Does that answer your question? It's a rather unclear question.
Because as the compiler processes the code it maintains information about each identifier it comes across and each scope it comes across and maintains boundaries for the latter. It knows that K was declared in the while scope and after the scope ends it probably marks the variable as 'no longer in scope' causing any use to be marked as an error.
k is out of scope because the block it was defined in is closed.
I would say it's a meaningless question. K is out of scope because you wrote the program that way: the compiler's entire function is to recognize and translate the programming language, including the lexical scope aspect of it.

Question on C# Variable Scope vs. Other Languages

First of all, let me say that I've never used C# before, and I don't know about it much.
I was studying for my "Programming Languages" exam with Sebesta's "Concepts of Programming Languages 9th ed" book. After I read the following excerpt from "Scope declaration order (on 246th page)", I got a little bit puzzled:
"...For example, in C99, C++, Java the scope of all local variables is from their declarations to the ends of the blocks in which those declarations appear. However, in C# the scope of any variable declared in a block is the whole block, regardless of the position of the declaration in the block, as long as it is not in a nested block. The same is true for methods. Note that C# still requires that all variables be declared before they are used. Therefore, although the scope of a variable extends from the declaration to the top of the block or subprograms in which that declaration appears, the variable still cannot be used above its declaration"
Why did designers of C# make such decision? Is there any specific reason/advantage for such an unusual decision?
This prevents you from doing something such as
void Blah()
{
for (int i = 0; i < 10; i++)
{
// do something
}
int i = 42;
}
The reason is that it introduces the possibility for subtle bugs if you have to move code around, for instance. If you need i before your loop, now your loop is broken.
One example of a benefit of reduced confusion is that if you have a nested block above the variable declaration, the variable declaration will be in effect and prevent the nested block from declaring a variable with the same name.
From the C# Spec
class A
{
int i = 0;
void F() {
i = 1; // Error, use precedes declaration
int i;
i = 2;
}
void G() {
int j = (j = 1); // Valid
}
void H() {
int a = 1, b = ++a; // Valid
}
}
The scoping rules for local variables are designed to guarantee that the meaning of a name used in an expression context is always the same within a block. If the scope of a local variable were to extend only from its declaration to the end of the block, then in the example above, the first assignment would assign to the instance variable and the second assignment would assign to the local variable, possibly leading to compile-time errors if the statements of the block were later to be rearranged.
It's not that strange. As far as variables go, it enforces unique naming better than Java/C++.
Eric Lippert's answer on this related question might be of help.
As Anthony Pegram said earlier, C# enforces this rule because there are cases where rearranging the code can cause subtle bugs, leading to confusion.

Categories

Resources