Dictionary.TryGetValue and possible 'null' warning - c#

I can't seem to wrap my head around the compiler's warning in this case:
using System;
using System.Collections.Generic;
#nullable enable
public class Program
{
public static void Main()
{
Guid guid = Guid.NewGuid();
Dictionary<Guid, string> d = new();
bool found = d.TryGetValue(guid, out string? str);
if (found is false)
{
return;
}
string s = str; // WARNING: possible null value
}
}
After all, I'm doing the found check and return if there is no value (e.g. when the out str value is null). Plus, the out parameter of the .TryGetValue method is annotated with [MaybeNullWhen(false)].
Would appreciate your help figuring our where I'm wrong in my expectations and fixing the code, thanks. Code is here.

Basically the compiler (or language specification) isn't "smart" enough to carry the conditional processing of the return value from TryGetValue over when you use a local variable.
If you inline the TryGetValue call into the if condition, it's fine:
if (!d.TryGetValue(guid, out string? str))
{
return;
}
string s = str; // No warning
It's possible that over time this will evolve to be more sophisticated, but it's relatively difficult to specify this sort of thing in a bullet-proof way.
This isn't limited to nullable reference types - there are other cases where logically code is fine from a human perspective, but the compiler will reject it. For example:
string text;
bool condition = DateTime.UtcNow.Hour == 5;
if (condition)
{
text = "hello";
}
if (condition)
{
Console.WriteLine(text); // Error: use of unassigned local variable
}
We know that if we get into the second if statement body, we'll also have gone into the first one so text will have been assigned a value, but the rules of the compiler don't attempt to be smart enough to spot that.

Related

Why does this code give a "Possible null reference return" compiler warning?

Consider the following code:
using System;
#nullable enable
namespace Demo
{
public sealed class TestClass
{
public string Test()
{
bool isNull = _test == null;
if (isNull)
return "";
else
return _test; // !!!
}
readonly string _test = "";
}
}
When I build this, the line marked with !!! gives a compiler warning: warning CS8603: Possible null reference return..
I find this a little confusing, given that _test is readonly and initialised to non-null.
If I change the code to the following, the warning goes away:
public string Test()
{
// bool isNull = _test == null;
if (_test == null)
return "";
else
return _test;
}
Can anyone explain this behaviour?
I can make a reasonable guess as to what's going on here, but it's all a bit complicated :) It involves the null state and null tracking described in the draft spec. Fundamentally, at the point where we want to return, the compiler will warn if the state of the expression is "maybe null" instead of "not null".
This answer is in somewhat narrative form rather than just "here's the conclusions"... I hope it's more useful that way.
I'm going to simplify the example slightly by getting rid of the fields, and consider a method with one of these two signatures:
public static string M(string? text)
public static string M(string text)
In the implementations below I've given each method a different number so I can refer to specific examples unambiguously. It also allows all of the implementations to be present in the same program.
In each of the cases described below, we'll do various things but end up trying to return text - so it's the null state of text that's important.
Unconditional return
First, let's just try to return it directly:
public static string M1(string? text) => text; // Warning
public static string M2(string text) => text; // No warning
So far, so simple. The nullable state of the parameter at the start of the method is "maybe null" if it's of type string? and "not null" if it's of type string.
Simple conditional return
Now let's check for null within the if statement condition itself. (I would use the conditional operator, which I believe will have the same effect, but I wanted to stay truer to the question.)
public static string M3(string? text)
{
if (text is null)
{
return "";
}
else
{
return text; // No warning
}
}
public static string M4(string text)
{
if (text is null)
{
return "";
}
else
{
return text; // No warning
}
}
Great, so it looks like within an if statement where the condition itself checks for nullity, the state of the variable within each branch of the if statement can be different: within the else block, the state is "not null" in both pieces of code. So in particular, in M3 the state changes from "maybe null" to "not null".
Conditional return with a local variable
Now let's try to hoist that condition to a local variable:
public static string M5(string? text)
{
bool isNull = text is null;
if (isNull)
{
return "";
}
else
{
return text; // Warning
}
}
public static string M6(string text)
{
bool isNull = text is null;
if (isNull)
{
return "";
}
else
{
return text; // Warning
}
}
Both M5 and M6 issue warnings. So not only do we not get the positive effect of the state change from "maybe null" to "not null" in M5 (as we did in M3)... we get the opposite effect in M6, where the state goes from "not null" to "maybe null". That really surprised me.
So it looks like we've learned that:
Logic around "how a local variable was computed" isn't used to propagate state information. More on that later.
Introducing a null comparison can warn the compiler that something it previously thought wasn't null might be null after all.
Unconditional return after an ignored comparison
Let's look at the second of those bullet points, by introducing a comparison before an unconditional return. (So we're completely ignoring the result of the comparison.):
public static string M7(string? text)
{
bool ignored = text is null;
return text; // Warning
}
public static string M8(string text)
{
bool ignored = text is null;
return text; // Warning
}
Note how M8 feels like it should be equivalent to M2 - both have a not-null parameter which they return unconditionally - but the introduction of a comparison with null changes the state from "not null" to "maybe null". We can get further evidence of this by trying to dereference text before the condition:
public static string M9(string text)
{
int length1 = text.Length; // No warning
bool ignored = text is null;
int length2 = text.Length; // Warning
return text; // No warning
}
Note how the return statement doesn't have a warning now: the state after executing text.Length is "not null" (because if we execute that expression successfully, it couldn't be null). So the text parameter starts as "not null" due to its type, becomes "maybe null" due to the null comparison, then becomes "not null" again after text2.Length.
What comparisons affect state?
So that's a comparison of text is null... what effect similar comparisons have? Here are four more methods, all starting with a non-nullable string parameter:
public static string M10(string text)
{
bool ignored = text == null;
return text; // Warning
}
public static string M11(string text)
{
bool ignored = text is object;
return text; // No warning
}
public static string M12(string text)
{
bool ignored = text is { };
return text; // No warning
}
public static string M13(string text)
{
bool ignored = text != null;
return text; // Warning
}
So even though x is object is now a recommended alternative to x != null, they don't have the same effect: only a comparison with null (with any of is, == or !=) changes the state from "not null" to "maybe null".
Why does hoisting the condition have an effect?
Going back to our first bullet point earlier, why don't M5 and M6 take account of the condition which led to the local variable? This doesn't surprise me as much as it appears to surprise others. Building that sort of logic into the compiler and specification is a lot of work, and for relatively little benefit. Here's another example with nothing to do with nullability where inlining something has an effect:
public static int X1()
{
if (true)
{
return 1;
}
}
public static int X2()
{
bool alwaysTrue = true;
if (alwaysTrue)
{
return 1;
}
// Error: not all code paths return a value
}
Even though we know that alwaysTrue will always be true, it doesn't satisfy the requirements in the specification that make the code after the if statement unreachable, which is what we need.
Here's another example, around definite assignment:
public static void X3()
{
string x;
bool condition = DateTime.UtcNow.Year == 2020;
if (condition)
{
x = "It's 2020.";
}
if (!condition)
{
x = "It's not 2020.";
}
// Error: x is not definitely assigned
Console.WriteLine(x);
}
Even though we know that the code will enter exactly one of those if statement bodies, there's nothing in the spec to work that out. Static analysis tools may well be able to do so, but trying to put that into the language specification would be a bad idea, IMO - it's fine for static analysis tools to have all kinds of heuristics which can evolve over time, but not so much for a language specification.
The nullable flow analysis tracks the null state of variables, but it does not track other state, such as the value of a bool variable (as isNull above), and it does not track the relationship between the state of separate variables (e.g. isNull and _test).
An actual static analysis engine would probably do those things, but would also be "heuristic" or "arbitrary" to some degree: you couldn't necessarily tell the rules it was following, and those rules might even change over time.
That's not something we can do directly in the C# compiler. The rules for nullable warnings are quite sophisticated (as Jon's analysis shows!), but they are rules, and can be reasoned about.
As we roll out the feature it feels like we mostly struck the right balance, but there are a few places that do come up as awkward, and we'll be revisiting those for C# 9.0.
You have discovered evidence that the program-flow algorithm that produces this warning is relatively unsophisticated when it comes to tracking the meanings encoded in local variables.
I have no specific knowledge of the flow checker's implementation, but having worked on implementations of similar code in the past, I can make some educated guesses. The flow checker is likely deducing two things in the false positive case: (1) _test could be null, because if it could not, you would not have the comparison in the first place, and (2) isNull could be true or false -- because if it could not, you would not have it in an if. But the connection that the return _test; only runs if _test is not null, that connection is not being made.
This is a surprisingly tricky problem, and you should expect that it will take a while for the compiler to attain the sophistication of tools that have had multiple years of work by experts. The Coverity flow checker, for example, would have no problem at all in deducing that neither of your two variations had a null return, but the Coverity flow checker costs serious money for corporate customers.
Also, the Coverity checkers are designed to run on large codebases overnight; the C# compiler's analysis must run between keystrokes in the editor, which significantly changes the sorts of in-depth analyses you can reasonably perform.
All the other answers are pretty much exactly correct.
In case anyone's curious, I tried to spell out the compiler's logic as explicitly as possible in https://github.com/dotnet/roslyn/issues/36927#issuecomment-508595947
The one piece that's not mentioned is how we decide whether a null check should be considered "pure", in the sense that if you do it, we should seriously consider whether null is a possibility. There are a lot of "incidental" null checks in C#, where you test for null as a part of doing something else, so we decided that we wanted to narrow down the set of checks to ones that we were sure people were doing deliberately. The heuristic we came up with was "contains the word null", so that's why x != null and x is object produce different results.

Local variable might not be initialized before accessing - after calling .TryParse()

I have this piece of code:
var kv = new Dictionary<int, string>() { ... };
var kv30Valid = kv.ContainsKey(30) && int.TryParse(kv[30], out var kv30Value);
myObject.nullableInt = kv30Valid ? (int?)kv30Value : null;
Note: myObject is a POCO class representing a table row, that's why nullable int.
I cannot compile my code because I get compiler error on the last line:
Local variable 'kv30Value' might not be initialized before accessing
In which case can it be unintialized and how to properly handle the case to allow valid code?
I need to populate the myObject properties with values from the kv (if they are present) parsed to their respective values.
Solution:
Moving the condition into TryParse() method solved the problem.
var kv30Valid = int.TryParse(kv.ContainsKey(30) ? kv[30] : null, out var kv30Value);
The definite assignment analyzer has limitations (as it must, yada yada yada halting problem, etc). Although we can look at this code and conclude that it'll only access kv30value if ContainsKey returned true and thus TryParse was called, it's too "separate" for the analyzer to be able to see this.
If this was inside an if block using kv30valid it might be able to see it but even then I'm not sure.
In which case can it be unintialized
When kv.ContainsKey(30) returns false, int.TryParse() isn't called, and kv30Value won't be assigned.
To simplify the issue, you have an unassigned variable:
bool test; // declared, but not assigned
if (test)
{
Console.WriteLine("Test is true");
}
This won't compile, because test is not assigned.
Now a method with an out parameter will definitely assign the variable:
bool test;
Assign(out test);
if (test)
{
Console.WriteLine("Test is true");
}
private static void Assign(out bool foo)
{
foo = true;
}
This will print "Test is true".
Now if you make the assignment conditional:
bool test;
bool condition = DateTime.Now > DateTime.Now;
if (condition)
{
Assign(out test);
}
if (test) { ... }
You'll be back at the compiler error:
CS0165: Use of unassigned local variable test
Because the assignment of test can't be guaranteed by the compiler, so it forbids further use of that variable.
Even if the usage of the variable uses the same condition:
bool test;
bool condition = DateTime.Now > DateTime.Now;
if (condition)
{
Assign(out test);
}
if (condition && test)
{
Console.WriteLine("Test is true");
}
Then still the compiler refuses you to use test.
That's exactly the same as with your && int.TryParse(..., out) code. The right side of the && is conditionally executed, thus the compiler will refuse to let you use the variable that's potentially unassigned.
Regarding the discussion below and the downvote on my answer, if you want to know the why behind all this, see the C# language specification chapter 5.3 Definite assignment. Basically put, you get this error because the compiler does a best effort attempt at statically analyzing whether the variable is assigned.
how to properly handle the case to allow valid code?
Declare it above, and assign it a sensible default value:
int kv30Value = 0;
var kv30Valid = kv.ContainsKey(30) && int.TryParse(kv[30], out kv30Value);
Or simplify the code by moving it into an if, where it'll be definitely assigned:
if (kv.ContainsKey(30) && int.TryParse(kv[30], out var kv30Value))
{
myObject.nullableInt = kv30Value;
}

declare variables in argument list

It's possible in c# 7 to declare variables for out variables in argument list:
if (int.TryParse(input, out int result))
WriteLine(result);
Is it possible to declare ("non out") variable in argument list? Like this:
if (!string.IsNullOrEmpty(string result=FuncGetStr()))
WriteLine(result);
You can't do it in the argument list, no.
You could use pattern matching for this, but I wouldn't advise it:
if (FuncGetStr() is string result && !string.IsNullOrEmpty(result))
That keeps the declaration within the source code of the if, but the scope of result is still the enclosing block, so I think it would much simpler just to separate out:
// Mostly equivalent, and easier to read
string result = FuncGetStr();
if (!string.IsNullOrEmpty(result))
{
...
}
There are two differences I can think of:
result isn't definitely assigned after the if statement in the first version
string.IsNullOrEmpty isn't even called in the first version if FuncGetStr() returns null, as the is pattern won't match. You could therefore write it as:
if (FuncGetStr() is string result && result != "")
To be utterly horrible, you could do it, with a helper method to let you use out parameters. Here's a complete example. Please note that I am not suggesting this as something to do.
// EVIL CODE: DO NOT USE
using System;
public class Test
{
static void Main(string[] args)
{
if (!string.IsNullOrEmpty(Call(FuncGetStr, out string result)))
{
Console.WriteLine(result);
}
}
static string FuncGetStr() => "foo";
static T Call<T>(Func<T> func, out T x) => x = func();
}
You can assign variables in statements, but the declaration of the variables should be done outside of them. You can't combine them (outside out and pattern matching, as you already indicated in your question).
bool b;
string a;
if (b = string.IsNullOrEmpty(a = "a")){ }
On the why this behavior is different than with out, etc, Damien_The_Unbeliever's comment might be interesting:
The ability to declare out variables inline arises from the awkwardness that it a) has to be a variable rather than a value and b) there's often nothing too useful to do with the value if you declare it before the function is called. I don't see the same motivations for other such uses.

Why does pattern matching on a nullable result in syntax errors?

I like to use pattern-matching on a nullable int i.e. int?:
int t = 42;
object tobj = t;
if (tobj is int? i)
{
System.Console.WriteLine($"It is a nullable int of value {i}");
}
However, this results in the following syntax errors:
CS1003: Syntax error, ';',
CS1525: Invalid expression term ')',
CS0103: The name 'i' does not exist in the current context.
'i)' is marked with a red squiggly line.
The expression compiles when using the old operator is:
int t = 42;
object tobj = t;
if (tobj is int?)
{
System.Console.WriteLine($"It is a nullable int");
}
string t = "fourty two";
object tobj = t;
if (tobj is string s)
{
System.Console.WriteLine($#"It is a string of value ""{s}"".");
}
Also works as expected.
(I'm using c#-7.2 and tested with both .net-4.7.1 and .net-4.6.1)
I thought it had something to with operator precedence. Therefore, I have tried using parenthesis at several places but this didn't help.
Why does it give these syntax errors and how can I avoid them?
The type pattern in its various forms: x is T y, case T y etc, always fails to match when x is null. This is because null doesn't have a type, so asking "is this null of this type?" is a meaningless question.
Therefore t is int? i or t is Nullable<int> i makes no sense as a pattern: either t is an int, in which case t is int i will match anyway, or it's null, in which case no type pattern can result in a match.
And that is the reason why t is int? i or t is Nullable<int> i are not, and probably never will be, supported by the compiler.
The reason why you get additional errors from the compiler when using t is int? i is due to the fact that, e.g. t is int? "it's an int" : "no int here" is valid syntax, thus the compiler gets confused over your attempts to use ? for a nullable type in this context.
As to how can you avoid them, the obvious (though probably not very helpful) answer is: don't use nullable types as the type in type patterns. A more useful answer would require you to explain why you are trying to do this.
Change your code into:
int t = 42;
object tobj = t;
if (tobj is Nullable<int> i)
{
Console.WriteLine($"It is a nullable int of value {i}");
}
This produces the more helpful:
CS8116: It is not legal to use nullable type 'int?' in a pattern; use the underlying type 'int' instead (Could not find documentation about CS8116 to reference)
Others (user #Blue0500 at github ) have tagged this behaviour as a bug Roslyn issue #20156. Reacting to Roslyn issue #20156, Julien Couvreur from Microsoft has said he thinks it is by design.
Neal Gafter from Microsoft working on Roslyn has also said better diagnostics are wanted for use of nullable type is switch pattern.
So, the error message can be avoided by using:
int t = 42;
object tobj = t;
if (tobj == null)
{
Console.WriteLine($"It is null");
}
else if (tobj is int i)
{
Console.WriteLine($"It is a int of value {i}");
}
Except for issues when parsing tobj is int? i, this still leaves the question why is tobj is int? i or tobj is Nullable<int> i not allowed.
For anyone wondering how to actually use pattern matching with nullables, you can do so with a generic helper function, like so:
public static bool TryConvert<T>(object input, out T output)
{
if (input is T result)
{
output = result;
return true;
}
output = default(T);
// Check if input is null and T is a nullable type.
return input == null && System.Nullable.GetUnderlyingType(typeof(T)) != null;
}
This will return true if T is a nullable or non-nullable of the same type that input contains or if input is null and T is nullable. Basically works the same as normal, but also handles nullables.
Side note: Interestingly, from my testing, I found System.Nullable.GetUnderlyingType(typeof(T)) allocates 40 bytes of garbage every time it's called if T is nullable. Not sure why, seems like a bug to me, but that's potentially a hefty price to pay rather than just null-checking like normal.
Knowing that, here's a better function:
public static bool TryConvert<T>(object input, out T? output) where T : struct
{
if (input is T result)
{
output = result;
return true;
}
output = default(T?);
return input == null;
}

C# - checking if a variable is initialized

I want to check if a variable is initialized at run time, programmatically. To make the reasons for this less mysterious, please see the following incomplete code:
string s;
if (someCondition) s = someValue;
if (someOtherCondition) s = someOtherValue;
bool sIsUninitialized = /* assign value correctly */;
if (!sIsUninitialized) Console.WriteLine(s) else throw new Exception("Please initialize s.");
And complete the relevant bit.
One hacky solution is to initialize s with a default value:
string s = "zanzibar";
And then check if it changed:
bool sIsUninitialized = s == "zanzibar";
However, what if someValue or someOtherValue happen to be "zanzibar" as well? Then I have a bug. Any better way?
Code won't even compile if the compiler knows a variable hasn't been initialized.
string s;
if (condition) s = "test";
// compiler error here: use of unassigned local variable 's'
if (s == null) Console.Writeline("uninitialized");
In other cases you could use the default keyword if a variable may not have been initialized. For example, in the following case:
class X
{
private string s;
public void Y()
{
Console.WriteLine(s == default(string)); // this evaluates to true
}
}
The documentation states that default(T) will give null for reference types, and 0 for value types. So as pointed out in the comments, this is really just the same as checking for null.
This all obscures the fact that you should really initialize variables, to null or whatever, when they are first declared.
With C# 2.0, you have the Nullable operator that allows you to set an initial value of null for heretofore value types, allowing for such things as:
int? x = null;
if (x.HasValue)
{
Console.WriteLine("Value for x: " + num.Value);
}
Which yields:
"Value for x: Null".
Just assign it null by default, not a string value
Here's one way:
string s;
if (someCondition) { s = someValue; }
else if (someOtherCondition) { s = someOtherValue; }
else { throw new Exception("Please initialize s."); }
Console.WriteLine(s)
This might be preferable for checking if the string is null, because maybe someValue is a method that can sometimes return null. In other words, maybe null is a legitimate value to initialize the string to.
Personally I like this better than an isInitialized flag. Why introduce an extra flag variable unless you have to? I don't think it is more readable.
You can keep a separate flag that indicates that the string has been initialized:
string s = null;
bool init = false;
if (conditionOne) {
s = someValueOne;
init = true;
}
if (conditionTwo) {
s = someValueTwo;
init = true;
}
if (!init) {
...
}
This will take care of situations when s is assigned, including the cases when it is assigned null, empty string, or "zanzibar".
Another solution is to make a static string to denote "uninitialized" value, and use Object.ReferenceEquals instead of == to check if it has changed. However, the bool variable approach expresses your intent a lot more explicitly.
I would agree with Vytalyi that a default value of null should be used when possible, however, not all types (like int) are nullable. You could allocate the variable as a nullable type as explained by David W, but this could break a lot of code in a large codebase due to having to refine the nullable type to its primitive type before access.
This generic method extension should help for those who deal with large codebases where major design decisions were already made by a predecessor:
public static bool IsDefault<T>(this T value)
=> ((object) value == (object) default(T));
If you are staring from scratch, just take advantage of nullable types and initialize it as null; that C# feature was implemented for a reason.
I pick initialization values that can never be used, typical values include String.Empty, null, -1, and a 256 character random string generator .
In general, assign the default to be null or String.Empty. For situations where you cannot use those "empty" values, define a constant to represent your application-specific uninitialized value:
const string UninitializedString = "zanzibar";
Then reference that value whenever you want to initialize or test for initialization:
string foo = UnininitializedString;
if (foo == UninitiaizedString) {
// Do something
}
Remember that strings are immutable constants in C# so there is really only one instance of UninitializedString (which is why the comparison works).

Categories

Resources