Non-shortcircuting boolean operators and C# 7 Pattern matching - c#

Im currently writing an C# application, targeting .NET 4.7 (C# 7). I am confused after I tried using the new way of declaring a variable utilizing the "is" keyword:
if (variable is MyClass classInstance)
This way it works, but when doing:
if (true & variable is MyClass classInstance)
{
var a = classInstance;
}
Visual Studio (I'm using 2017) shows me the the Error Use of unassigned local variable 'classInstance'. Using the short-circuting version of & (&&) it works fine. Am I missing something about the & operator? (I know using the shortcircuting versions are much more commonly used, but at this point I'm just curious)

This one hurt my head, but I think I have got it figured out.
This confusion is caused by two quirks: the way the is operation leaves the variable undeclared (not null), and the way the compiler optimizes away Boolean, but not bitwise, operations.
Quirk 1. If the cast fails, the variable is unassigned (not null)
Per the documentation for the new is syntax:
If exp is true and is is used with an if statement, varname is assigned and has local scope within the if statement only.
If you read between the lines, this means that if the is cast fails, the variable is considered unassigned. This may be counterintuitive (some might expect it to be null instead). This means that any code within the if block that relies on the variable will not compile if there is any chance the overall if clause could evaluate to true without a type match present. So for example
This compiles:
if (instance is MyClass y)
{
var x = y;
}
And this compiles:
if (true && instance is MyClass y)
{
var x = y;
}
But this does not:
void Test(bool f)
{
if (f && instance is MyClass y)
{
var x = y; //Error: Use of unassigned local variable
}
}
Quirk 2. Boolean operations are optimized away, binary ones are not
When the compiler detects a predestined Boolean result, the compiler will not emit unreachable code, and skips certain validations as a result. For example:
This compiles:
void Test(bool f)
{
object neverAssigned;
if (false && f)
{
var x = neverAssigned; //OK (never executes)
}
}
But if you use & instead of &&, it does not compile:
void Test(bool f)
{
object neverAssigned;
if (false & f)
{
var x = neverAssigned; //Error: Use of unassigned local variable
}
}
When the compiler sees something like true && it just ignores it completely. Thus
if (true && instance is MyClass y)
Is exactly the same as
if (instance is MyClass y)
But this
if (true & instance is MyClass y)
Is NOT the same. The compiler still needs to emit code that performs the & operation and uses its output in a conditional statement. Or even if it doesn't, the current C# 7 compiler apparently performs the same validations as if it were. This may seem a little strange, but bear in mind that when you use & instead of &&, there is a guarantee that the & must execute, and though it seems unimportant in this example, the general case must allow for additional complexifying factors, such as operator overloading.
How the quirks combine
In the last example, the result of the if clause is determined at run time, not compile time. So the compiler can't be certain that y will end up being assigned before the contents of the if block are executed. Thus you get
if (true & instance is MyClass y)
{
var x = y; //Error: use of unassigned local variable
}
TLDR
In the situation of a compound logical operation, c# can't be sure that the overall if condition will resolve to true if and only if the cast is successful. Absent that certainty, it can't allow access to the variable, since it might be unassigned. An exception is made when the expression can be reduced to non-compound operation at compile time, for example by removing true &&.
Workaround
I think the way we are meant to use the new is syntax is as a single condition with an if clause. Adding true && at the beginning works because the compiler simply removes it. But anything else combined with the new syntax creates ambiguity about whether the new variable will be in an unassigned state when the code block runs, and the compiler can't allow that.
The workaround of course is to nest your conditions instead of combining them:
Won't work:
void Test(bool f)
{
if (f & instance is MyClass y)
{
var x = y; //Error: Use of unassigned local variable
}
}
Works fine:
void Test(bool f)
{
if (f)
{
if (instance is MyClass y)
{
var x = y; //Works
}
}
}

This is due to the rules on definite assignment, which have a special case for &&, but don't have a special case for &. I believe it's working as intended by the C# design team, but that the specification may have a little bit more work to do, at least in terms of clarity.
From the ECMA C# 5 standard, section 5.3.3.24:
For an expression expr of the form expr-first && expr-second:
...
The definite assignment state of v after expr is determined by:
If expr-first is a constant expression with the value false, then the definite assignment state of v after expr is the same as the definite assignment state of v after expr-first.
Otherwise, if the state of v after expr-first is definitely assigned, then the state of v after expr is definitely assigned.
Otherwise, if the state of v after expr-second is definitely assigned, and the state of v after expr-first is “definitely assigned after false expression”, then the state of v after expr is definitely assigned.
Otherwise, if the state of v after expr-second is definitely assigned or “definitely assigned after true expression”, then the state of v after expr is “definitely assigned after true expression”.
...
The relevant part for this case is the one I've highlighted in bold. classInstance is "definitely assigned after true expression" with the pattern (expr-second above), and none of the earlier cases apply, so the overall state at the end of the condition is "definitely assigned after true expression". That means within the if statement body, it's definitely assigned.
There's no equivalent clause for the & operator. While there potentially could be, it would be complicated by the types involved - it would have to only apply to the & operator when used with bool expressions, and I don't think most of the rest of definite assignment deals with types in that way.
Note that you don't need to use pattern matching to see this.
Consider the following program:
using System;
class Program
{
static void Main()
{
bool a = false;
bool x;
bool y = true;
if (true & (y && (x = a)))
{
Console.WriteLine(x);
}
}
}
The expression y && (x = a) is another one where x ends up being "definitely assigned after true expression". Again, the code above fails to compile due to x not being definitely assigned, whereas if you change the & to && it compiles. So at least this isn't an issue in pattern matching itself.
What confuses me somewhat is why x isn't still "definitely assigned after true expression" due to 5.3.3.21 ("5.3.3.21 General rules for expressions with embedded expressions"), which contains:
The definite assignment state of v at the end of expr is the same as the definite assignment state at the end of exprn.
I suspect this is meant to only include "definitely assigned" or "not definitely assigned", rather than including the "definitely assigned after true expression" part - although it's not as clear as it should be.

Am I missing something about the & operator?
No, I don't think so. Your expectations seem correct to me. Consider this alternative example, which does compile without error:
object variable = null;
MyClass classInstance;
if (true & ((classInstance = variable as MyClass) != null))
{
var a = classInstance;
}
var b = classInstance;
(To me, it's more interesting to consider the assignment outside the if body, since that's where the short-circuiting would affect behavior.)
With the explicit assignment, the compiler recognizes classInstance as definitely assigned, in the assignments of both a and b. It should be able to do the same thing with the new syntax.
With logical and, short-circuiting or not shouldn't matter. Your first value is true, so the second half should always need to be evaluated to get the whole expression. As you've noted, the compiler does treat the & and && differently though, which is unexpected.
A variation on this is this code:
static void M3()
{
object variable = null;
if (true | variable is MyClass classInstance)
{
var a = classInstance;
}
}
The compiler correctly identifies classInstance as not definitely assigned when || is used, but has the same apparent misbehavior with | (i.e. also saying that classInstance is not definitely assigned), even though with the non-short-circuiting operator, the assignment must happen regardless.
Again, the above works correctly with the assignment is explicit, rather than using the new syntax.
If this were just about the definite assignment rules not being addressed with the new syntax, then I would expect && to be as broken as &. But it's not. The compiler handles that correctly. And indeed, in the feature documentation (I hesitate to say "specification", because there's no ECMA-ratified C# 7 specification yet), it reads:
The type_pattern both tests that an expression is of a given type and casts it to that type if the test succeeds. This introduces a local variable of the given type named by the given identifier. That local variable is definitely assigned when the result of the pattern-matching operation is true. [emphasis mine]
Since short-circuiting produces correct behavior without pattern matching, and since pattern matching produces correct behavior without short-circuiting (and definite-assignment is explicitly addressed in the feature description), I would say this is straight-up a compiler bug. There's probably some overlooked interaction between non-short-circuiting Boolean operators and the way the pattern-matched expression is evaluated that causes the definite assignment to get lost in the shuffle.
You should consider reporting it to the authorities. I think these days, the Roslyn GitHub issue-tracking is where they track this sort of thing. It might help if you explain in your report how you found this and why that particular syntax is important in your scenario (since in the code you posted, the && operator works equivalently…the non-short-circuiting & doesn't seem to confer any advantage to the code).

Related

Null-coalescing out parameter gives unexpected warning

Using this construct:
var dict = new Dictionary<int, string>();
var result = (dict?.TryGetValue(1, out var value) ?? false) ? value : "Default";
I get an error saying CS0165 use of unassigned local variable 'value' which is not what I expect. How could value possibly be undefined? If the dictionary is null the inner statement will return false which will make the outer statement evaluate to false, returning Default.
What am I missing here? Is it just the compiler being unable to evaluate the statement fully? Or Have I messed it up somehow?
Your analysis is correct. It is not the analysis the compiler makes, because the compiler makes the analysis that is required by the C# specification. That analysis is as follows:
If the condition of a condition?consequence:alternative expression is a compile-time constant true then the alternative branch is not reachable; if false, then the consequence branch is not reachable; otherwise, both branches are reachable.
The condition in this case is not a constant, therefore the consequence and alternative are both reachable.
local variable value is only definitely assigned if dict is not null, and therefore value is not definitely assigned when the consequence is reached.
But the consequence requires that value be definitely assigned
So that's an error.
The compiler is not as smart as you, but it is an accurate implementation of the C# specification. (Note that I have not sketched out here the additional special rules for this situation, which include predicates like "definitely assigned after a true expression" and so on. See the C# spec for details.)
Incidentally, the C# 2.0 compiler was too smart. For example, if you had a condition like 0 * x == 0 for some int local x it would deduce "that condition is always true no matter what the value of x is" and mark the alternative branch as unreachable. That analysis was correct in the sense that it matched the real world, but it was incorrect in the sense that the C# specification clearly says that the deduction is only to be made for compile-time constants, and equally clearly says that expressions involving variables are not constant.
Remember, the purpose of this thing is to find bugs, and what is more likely? Someone wrote 0 * x == 0 ? foo : bar intending that it have the meaning "always foo", or that they've written a bug by accident? I fixed the bug in the compiler and since then it has strictly matched the specification.
In your case there is no bug, but the code is too complicated for the compiler to analyze, so it is probably also too complicated to expect humans to analyze. See if you can simplify it. What I might do is:
public static V GetValueOrDefault<K, V>(
this Dictionary<K, V> d,
K key,
V defaultValue)
{
if (d != null && d.TryGetValue(key, out var value))
return value;
return defaultValue;
}
…
var result = dict.GetValueOrDefault(1, "Default");
The goal should be to make the call site readable; I think my call site is considerably more readable than yours.
Is it just the compiler being unable to evaluate the statement fully?
Yes, more or less.
The compiler does not track unassigned, it tracks the opposite 'defintely assigned'. It has to stop somewhere, in this case it would need to incorporate knowledge about the library method TryGetValue(). It doesn't.

Short circuit with TryParse() and out variable reliable?

This subject has been discussed many times and is clear: Is relying on && short-circuiting safe in .NET?
What I cannot find is a clear answer if the short circuit is also reliable when out parameter is used in the right part of the if clause:
int foo;
....
if ( false == Int32.TryParse( bar, out foo ) || 0 == foo )
When I try this code then it works. But I am just a bit worried if the behavior is reliable, because I don't know how the compiler is translating the code and accessing foo.
My question:
Is the access to foo really done after the TryParse which would consider any changed value or the compiler can sometimes read the value before TryParse [because of some optimizations] and therefore I cannot rely on this code.
Is the access to foo really done after the TryParse which would
consider any changed value
Yes, this will be the case.
or the compiler can sometimes read the value before TryParse [because
of some optimizations] and therefore I cannot rely on this code.
No, it will not read foo before the call to Int32.TryParse; one being that the logical OR will always evaluate the left side and only evaluates the right side if it's necessary. two, it will not evaluate the right side first because the foo variable is not initialised at that point which would cause a compilation error (assuming it's a local variable).
You can verify this works by using the new language features in C#:
if (!int.TryParse("0", out int foo) || foo == 0)
{
//foo == 0 or the tryparse failed
//foo still exists here though, so if it fails, it will still == 0
}
You can see that the declaration of foo happens inside the int.TryParse call, which wouldn't happen if foo == 0 was evaluated first. This is legal syntax in C# 7 and up. Really this is all syntactic sugar for this:
int foo = 0
if (!int.TryParse("0", out foo) || foo == 0)
{
//foo == 0 or the tryparse failed
}

Need in IS operator in C# [duplicate]

This question already has answers here:
c# casting with is and as
(5 answers)
Closed 6 years ago.
In C# there is is operator for checking if object is compatible with some type. This operators tries to cast object to some type and if casting is successful it returns true (or false if casting fails).
From Jeffrey Richter CLR via C#:
The is operator checks whether an object is compatible with a given
type, and the result of the evaluation is a Boolean: true or false.
if (o is Employee)
{
Employee e = (Employee) o;
// Use e within the remainder of the 'if' statement.
}
In this code, the CLR is actually checking the object’s type twice:
The is operator first checks to see if o is compatible with the
Employee type. If it is, inside the if statement, the CLR again
verifies that o refers to an Employee when performing the cast. The
CLR’s type checking improves security, but it certainly comes at a
performance cost, because the CLR must determine the actual type of
the object referred to by the variable (o), and then the CLR must walk
the inheritance hierarchy, checking each base type against the
specified type (Employee).
Also, from the same book:
Employee e = o as Employee;
if (e != null)
{
// Use e within the 'if' statement.
}
In this code, the CLR checks if o is compatible with the Employee
type, and if it is, as returns a non-null reference to the same
object. If o is not compatible with the Employee type, the as operator
returns null. Notice that the as operator causes the CLR to verify an
object’s type just once. The if statement simply checks whether e is
null; this check can be performed faster than verifying an object’s
type.
So, my question is: why do we need is operator? Which are the cases when is operator is more preferable over as.
why do we need is operator?
We don't need it. It is redundant. If the is operator were not in the language you could emulate it by simply writing
(x as Blah) != null
for reference types and
(x as Blah?) != null
for value types.
In fact, that is all is is; if you look at the IL, both is and as compile down to the same IL instruction.
Your first question cannot be answered because it presumes a falsehood. Why do we need this operator? We don't need it, so there is no reason why we need it. So this is not a productive question to ask.
Which are the cases when is operator is more preferable over as.
I think you meant to ask
why would I write the "inefficient" code that does two type checks -- is followed by a cast -- when I could write the efficient code that does one type check using as and a null check?
First of all, the argument from efficiency is weak. Type checks are cheap, and even if they are expensive, they're probably not the most expensive thing you do. Don't change code that looks perfectly sensible just to save those few nanoseconds. If you think the code looks nicer or is easier to read using is rather than as, then use is rather than as. There is no product in the marketplace today whose success or failure was predicated on using as vs is.
Or, look at it the other way. Both is and as are evidence that your program doesn't even know what the type of a value is, and programs where the compiler cannot work out the types tend to be (1) buggy, and (2) slow. If you care so much about speed, don't write programs that do one type test instead of two; write programs that do zero type tests instead of one! Write programs where typing can be determined statically.
Second, there are situations in C# where you need an expression, not a statement, and C# unfortunately does not have "let" expressions outside of queries. You can write
... e is Manager ? ((Manager)e).Reports : 0 ...
as an expression but pre C# 7 there was no way to write
Manager m = e as Manager;
in an expression context. In a query you could write either
from e in Employees
select e is Manager ? ((Manager)e).Reports : 0
or
from e in Employees
let m = e as Manager
select m == null ? 0 : m.Reports
but there is no "let" in an expression context outside of queries. It would be nice to be able to write
... let m = e as Manager in m == null ? 0 : m.Reports ...
in an arbitrary expression. But we can get some of the way there. In C# 7 you'll (probably) be able to write
e is Manager m ? m.Reports : 0 ...
which is a nice sugar and eliminates the inefficient double-check. The is-with-new-variable syntax nicely combines everything together: you get a Boolean type test and a named, typed reference.
Now, what I just said is a slight lie; as of C# 6 you can write the code above as
(e as Manager)?.Reports ?? 0
which does the type check once. But pre C# 6.0 you were out of luck; you pretty much always had to do the type check twice if you were in an expression context.
With C# 7 operator is can be less wordy then as
Compare this
Employee e = o as Employee;
if (e != null)
{
// Use e within the 'if' statement.
}
and this
if (o is Employee e)
{
// Use e within the 'if' statement.
}
Information from here. Section Pattern Matching with Is Expressions
There are times when you might want to just check the type not actually go through the effort of casting it.
As such you can just use the is operator to confirm your object is compatible, and do whatever logic you want. Whereas in other scenarios you may just want to cast (Safely or otherwise) and utilise the returned value.
Ultimately because is just returns a boolean, you can use it for checking.
as and the (T)MyType type casting are used to safely casting to null, or throwing an Exception respectively
How to: Safely Cast by Using as and is Operators (C# Programming Guide)
At least one use-case I can think of is when comparing if a certain variable is a value type (as cannot be used in that case).
For instance,
var x = ...;
if(x is bool)
{
// do something
}
It can also be useful when you don't necessarily need to use the cast, but are simply interested whether or not something is of a certain underlying type.

Does the ?? operator guarantee only to run the left-hand argument once?

For example, given this code:
int? ReallyComplexFunction()
{
return 2; // Imagine this did something that took a while
}
void Something()
{
int i = ReallyCleverFunction() ?? 42;
}
... is it guaranteed that the function will only be called once? In my test it's only called once, but I can't see any documentation stating I can rely on that always being the case.
Edit
I can guess how it is implemented, but c'mon: we're developers. We shouldn't be muddling through on guesses and assumptions. For example, will all future implementations be the same? Will another platform's implementation of the language be the same? That depends on the specifications of the language and what guarantees it offers. For example, a future or different platform implementation (not a good one, but it's possible) may do this in the ?? implementation:
return ReallyComplexFunction() == null ? 42 : ReallyComplexFunction();
That, would call the ReallyComplexFunction twice if it didn't return null. (although this looks a ridiculous implementation, if you replace the function with a nullable variable it looks quite reasonable: return a == null ? 42 : a)
As stated above, I know in my test it's only called once, but my question is does the C# Specification guarantee/specify that the left-hand side will only be called once? If so, where? I can't see any such mention in the C# Language Specification for ?? operator (where I originally looked for the answer to my query).
The ?? operator will evaluate the left side once, and the right side zero or once, depending on the result of the left side.
Your code
int i = ReallyCleverFunction() ?? 42;
Is equivalent to (and this is actually very close to what the compiler actually generates):
int? temp = ReallyCleverFunction();
int i = temp.HasValue ? temp.Value : 42;
Or more simply:
int i = ReallyCleverFunction().GetValueOrDefault(42);
Either way you look at it, ReallyCleverFunction is only called once.
The ?? operator has nothing to do with the left hand. The left hand is run first and then the ?? operator evaluates its response.
In your code, ReallyCleverFunction will only run once.
It will be only called once. If the value of ReallyCleverFunction is null, the value 42 is used, otherwise the returned value of ReallyCleverFunction is used.
It will evaluate the function, then evaluate the left value (which is the result of the function) of the ?? operator. There is no reason it would call the function twice.
It is called once and following the documentation I believe this should be sufficient to assume it is only called once:
It returns the left-hand operand if the operand is not null; otherwise
it returns the right operand.
?? operator
It will only be run once because the ?? operator is just a shortcut.
Your line
int i = ReallyCleverFunction() ?? 42;
is the same as
int? temp = ReallyCleverFunction();
int i;
if (temp != null)
{
i = temp.Value;
} else {
i = 42;
}
The compiler does the hard work.
Firstly, the answer is yes it does guarantee it will only evaluate it once, by inference in section 7.13 of the official C# language specification.
Section 7.13 always treats a as an object, therefore it must only take the return of a function and use that in the processing. It says the following about the ?? operator (emphasis added):
• If b is a dynamic expression, the result type is dynamic. At run-time, a is first evaluated. If a is not null, a is converted to dynamic, and this becomes the result. Otherwise, b is evaluated, and this becomes the result.
• Otherwise, if A exists and is a nullable type and an implicit conversion exists from b to A0, the result type is A0. At run-time, a is first evaluated. If a is not null, a is unwrapped to type A0, and this becomes the result. Otherwise, b is evaluated and converted to type A0, and this becomes the result.
• Otherwise, if A exists and an implicit conversion exists from b to A, the result type is A. At run-time, a is first evaluated. If a is not null, a becomes the result. Otherwise, b is evaluated and converted to type A, and this becomes the result.
• Otherwise, if b has a type B and an implicit conversion exists from a to B, the result type is B. At run-time, a is first evaluated. If a is not null, a is unwrapped to type A0 (if A exists and is nullable) and converted to type B, and this becomes the result. Otherwise, b is evaluated and becomes the result.
As a side-note the link given at the end of the question is not the C# language specification despite it's first ranking in a Google Search for the "C# language specification".

Terminology question: && operator

Well I have been reading quite a lot sources but they differ in definiton:
&& is logical operator of logical
conjunction //I guess this is correct
&& is operator of logical AND //I
think this is not precise from
technical point of view
&& is
conditional operator performing
logical AND //I think this is right
as well
While all are correct in terms of understanding, I would say the first one is most precise. Or am I mistaken?
In C#, Java, C++, C, and probably several other languages, && is a programming language implementation of the boolean AND operator. It is designed to account for the following facts (that do not apply in pure propositional logic):
Evaluating an operand may be computationally expensive
Evaluating an operand may fail with an exception
Evaluating an operand may enter an infinite loop
So a boolean expression in a programming language really has four possible outcomes: true, false, "exception", and "infinite loop". In some situations, one has two boolean expressions where the possible success of the second expression can be determined by looking at the first expression. For instance, with the expressions foo != null and foo.bar == 42, we can be certain that if the first expression is false, then the second expression will fail. Hence, the && operator is designed to be "short-cirquited": If the left operand evaluates to false, the right operand is not evaluated. In all cases where both operands would evaluate successfully to true or false, this rule produces the same result as if one actually had evaluated both operands, but it allows for increased performance (because the right operand might not need to be evaluated at all) and increased compactness without sacrificing safety (if one takes care to structure the expression such that the left operand "guards" the right one). Similarly, || will not evaluate the right operand if the left operand evaluates to true.
A shorter answer is that although && is strongly inspired by AND, it is designed to take certain programming peculiarities into account, and a && b should perhaps rather be phrased as "an expression that returns false if a is false, and the value of b if a is true".
The easy answer.
It performs an AND operation on 2 logical expressions returnting a Boolean. It does not evaluate the second if the first is false.
To evaluate both regardless use &.
The complicated answer.
http://www.ecma-international.org/publications/standards/Ecma-334.htm heading 12.3.3.23 seems most relavent.
I find myself unqualified to criticise the definition within.
12.3.3.23 && expressions
For an expression expr of the form expr-first && expr-second:
• The definite assignment state of v before expr-first is the same as the definite assignment state of v before expr.
• The definite assignment state of v before expr-second is definitely assigned if the state of v after exprfirst is either definitely assigned or “definitely assigned after true expression”. Otherwise, it is notdefinitely assigned.
• The definite assignment state of v after expr is determined by:
o If the state of v after expr-first is definitely assigned, then the state of v after expr is definitely assigned.
o Otherwise, if the state of v after expr-second is definitely assigned, and the state of v after expr-first is “definitely assigned after false expression”, then the state of v after expr is definitely assigned.
o Otherwise, if the state of v after expr-second is definitely assigned or “definitely assigned after true expression”, then the state of v after expr is “definitely assigned after true expression”.
o Otherwise, if the state of v after expr-first is “definitely assigned after false expression”, and the state of v after expr-second is “definitely assigned after false expression”, then the state of v after expr is “definitely assigned after false expression”.
o Otherwise, the state of v after expr is not definitely assigned.
[Example: In the following code
class A
{
static void F(int x, int y) {
int i;
if (x >= 0 && (i = y) >= 0) {
// i definitely assigned
}
else {
// i not definitely assigned
}
// i not definitely assigned
}
}
the variable i is considered definitely assigned in one of the embedded statements of an if statement but not in the other. In the if statement in method F, the variable i is definitely assigned in the first embedded statement because execution of the expression (i = y) always precedes execution of this embedded statement. In contrast, the variable i is not definitely assigned in the second embedded statement, since x >= 0 might have tested false, resulting in the variable i’s being unassigned. end example]
I hope that clears things up.
Conjunction is another word for "and". So they're all correct in terms of their output.
In terms of how they actually give the output, && usually doesn't evaluate the second argument if the first one is false.

Categories

Resources