Why is the code below
private static List<WorkflowVariableDataSet> MergeDatasetsListBranch(out List<WorkflowVariableDataSet> datasetsList)
{
if(datasetsList == null)
datasetsList=new List<WorkflowVariableDataSet>();
datasetsList=new List<WorkflowVariableDataSet>();
return datasetsList;
}
generating an error at the first if statement:
Out parameter 'datasetsList' might not be initialized before accessing.
I know it should be uninitialized at this point, but the word might suggest that the error lies in possible uninitialized object accessing (when it's not even accessed, it's the reference, that is checked). Ofc that doesn't happen with ref keyword, but I'm curious how is the reference checking violating out-parameters policy.
EDIT
I've edited the question and the example: the out object will be initialized inside the method anyway. The question is: WHY uninitialized object cannot be null compared? How is that different from:
object o;
if(o==null)
...
Compiler Error CS0269
Use of unassigned out parameter 'parameter' The compiler could not
verify that the out parameter was assigned a value before it was used;
its value may be undefined when assigned. Be sure to assign a value to
out parameters in the called method before accessing the value. If you
need to use the value of the variable passed in, use a ref parameter
instead.
So treat an out-parameter as unassigned. You are the one who is responsible.
So just remove the if:
datasetsList = new List<WorkflowVariableDataSet>();
If you want to process a list that is passed to this method use ref intead (as suggested above):
Because whether you've buggy code that never initialises the parameter, or buggy code that sometimes doesn't initialise it, it's still the same bug.
There's no point having a separate error message for the same bug depending on whether it hits in all or just one code paths; if there is a single code-path where the parameter is used before initialising, then it's has that error, and if there isn't a single code-path, then it doesn't.
So if we consider:
private static List<WorkflowVariableDataSet> MergeDatasetsListBranch(out List<WorkflowVariableDataSet> datasetsList)
{
if(_someBooleanField)
datasetsList = null;
if(datasetsList == null)
datasetsList=new List<WorkflowVariableDataSet>();
return datasetsList;
}
Here the use of an uninitialised out parameter might or might not happen, but that suffices to mean it has the same error.
As far as the error goes, there really isn't any significant difference between these two cases.
And therefore the error message uses might, even in cases where it will always apply.
Out arguments need not be initialized prior to being passed, the calling method is required to assign a value before the method returns.
Modified your code
private static List<WorkflowVariableDataSet> MergeDatasetsListBranch(out List<WorkflowVariableDataSet> datasetsList)
{
return datasetsList = new List<WorkflowVariableDataSet>();
}
Related
Below are the MSDN reference links for the Out keyword and the Ref keyword.
Out Keyword
Ref Keyword
Both Ref and Out keywords are pass by reference, then why it is required for one to be initialized and the other needn't to be initialized? Is it something by design convention or is there any other reason/meaning behind the same? Need some help.
A reference is basically the Address part of:
[Address] => points to => [Object]
The ref keyword passes the address of an existing object. The method can use the object at that address or instantiate an entirely new one (but at the same address). The address does need to be initialized, even if the value it holds is null. (Below is a little test driver that show this).
The out keyword says that the method must instantiate (or set to null) an object (not an address) that it returns.
[TestMethod]
public void TestMethod2()
{
MyClass myClass;
myClass = new MyClass(1);
// Initializing the ADDRESS as an existing object.
ByRefDemo(ref myClass);
Console.WriteLine($"Returned value is: {myClass}");
// Initializing the ADDRESS as a null object.
myClass = null;
ByRefDemo(ref myClass);
Console.WriteLine($"Returned value is: {myClass}");
}
class MyClass
{
public MyClass(int value)
{
Value = value;
}
public int Value { get; }
public override string ToString() => $"{Value}";
}
void ByRefDemo(ref MyClass addressOf)
{
var value = addressOf == null ? "NULL" : $"{addressOf}";
Console.WriteLine($"Incoming value is: {value}" );
addressOf = new MyClass(2);
}
You may be looking at this from the wrong perspective.
ref and out are both modifiers that are used in a function signature declaration, but they aren't really similar otherwise. They shouldn't both do the same thing or have the same requirements.
ref tells the compiler that you want a reference to something that would ordinarily be passed by value. You are passing a live object. Therefore, you have to already have a live object that you want to pass.
out tells the compiler that you want an output variable. It acts precisely as another function return value. Just like the function return value, anything assigned to it before the function is called will be overwritten by the value set in the function. So your output variable may be declared and even initialized or set but its value will be overwritten anyway.
You may use ref to do something similar to out, which may be dangerous and may violate best practices or other standards that you're using (it does commonly). Using out instead of ref won't work though.
There will be occasions when you're programming, that you aren't programming for yourself; you're programming for the benefit of someone else. Imagine you're writing some awesome parsing library or something..
There aren't any surprises for someone calling your method if they're passing you some reference type by value semantics; C# makes sure your method gets a copy of their reference. You can modify the contents of the instance at end of the reference, sure, but you can't surprise the caller by swapping out the reference they gave you for something else
It's different with things passed by original reference; they could give you some carefully crafted object that took hours to make and you could trash it by setting the reference to null. It would be nice for them to know this ahead of time, so they could keep their own copy of the reference
As such, with your arguments that are passed by original reference rather than by copy, you have 3 choices with which to decorate them to help indicate your intentions towards their data:
out - "don't bother spending hours crafting the perfect object; I will overwrite the reference you give"
ref - "give me some data, but don't be surprised if I replace your object with another. Keep your own reference if you're precious about losing it"
in - "the pass will be done by original reference but I promise I won't swap it out for something else"
The compiler helps you make the first and last promise by insisting that you do set an out/don't set an in; and this is essentially the answer to your question: in/out/ref behave the way they do by design to help you make the promises you make when you use one of them on an argument
out and ref perhaps don't seem to have much of a point if you're looking at things from "I'm going to write this method here and use it there" but it does help describe to someone else (who cannot see the inner workings of what your method does) what you will do with their thing they provide, and that's quite important but easy to overlook if you you don't have that "external caller" perspective in mind
If you look at next example:
public void TestLocalValuesAssignment()
{
int valueVariable; // = default(int) suits fine
string refType; // null suits fine as well
try
{
valueVariable = 5;
refType = "test";
}
catch (Exception){}
Console.WriteLine("int value is {0}", valueVariable);
Console.WriteLine("String is {0}", refType);
}
you could easily see, that variables valueVariable and refType could be unassigned before their usage in Console.WriteLine(). Compiler tells us about that with errors:
Error 1 Use of unassigned local variable 'valueVariable'
Error 2 Use of unassigned local variable 'refType'
This is a widespread case and there are loads of answers on how to fix that (possible fixes commented).
What I can't understand is why such behavior exists? How here local variables are different from class fields, where last ones get default value if not assigned (null for reference types and correspondent default value for value types)? Maybe there's an example or a corner case that explains why such compiler behavior is chosen?
basically - this is what MS decided.
If you want more you can read here and check Eric Lippert’s Blog
The reason this is illegal in C# is because using an unassigned local has high likelihood of being a bug.
It's described in c# spec:
5.1.7 Local variables
A local variable introduced by a local-variable-declaration is not
automatically initialized and thus has no default value. For the
purpose of definite assignment checking, a local variable introduced
by a local-variable-declaration is considered initially unassigned. A
local-variable-declaration may include a local-variable-initializer,
in which case the variable is considered definitely assigned only
after the initializing expression (§5.3.3.4).
Within the scope of a local variable introduced by a
local-variable-declaration, it is a compile-time error to refer to
that local variable in a textual position that precedes its
local-variable-declarator. If the local variable declaration is
implicit (§8.5.1), it is also an error to refer to the variable within
its local-variable-declarator.
When you do something that appears stupid, like reading from a variable you've never assigned, there are basically two things the compiler can do:
Give you a diagnostic calling your attention to what likely is a mistake.
Do something arbitrary.
Since option #1 helps you find mistakes, it is preferred, especially when the workaround to tell the compiler "No, I mean to use the original default value" is as simple as adding = 0, = null or = default(T).
As for why class members don't work the same way, it's because this can't be checked at compile time (because of the myriad different orders that the different methods could be called). There would be runtime cost of flags whether each member had been assigned, and testing of those flags.
Note that the compiler does enforce the restriction on struct members in a way that's easy to check at compile-time. Namely, each constructor is required to assign every member.
In reality, your code should be fine, but by strict interpretation, there is a code path which can leave your variables unassigned before use. The try block introduces the potential for code within the block to not be executed (if an exception is thrown), but still execute the code beyond the catch (because there is nothing in the catch such as return or throw to prevent the rest of your method from executing if an exception is thrown in the try).
If you are referring to the difference between initializing "struct" fields and initializing class fields, eg:
public class A
{
}
MyMethod()
{
int myInt; // Initialized to zero, yes, but not yet assigned.
// An error to use this before assigning it.
A myA; // defaults to null, which may be a valid initial state, but still unassigned.
// Also an error to use this before assigning it.
A oneMoreA = null; // Same value as default, but at least intention is clear.
A anotherA = new A(); // What is or is not happening in the constructor is a separate issue.
// At least anotherA refers to an actual instance of the class.
I've always thought that it's impossible for this to be null inside instance method body. Following simple program demonstrates that it is possible. Is this some documented behaviour?
class Foo
{
public void Bar()
{
Debug.Assert(this == null);
}
}
public static void Test()
{
var action = (Action)Delegate.CreateDelegate(typeof (Action), null, typeof(Foo).GetMethod("Bar"));
action();
}
UPDATE
I agree with the answers saying that it's how this method is documented. However, I don't really understand this behaviour. Especially because it's not how C# is designed.
We had gotten a report from somebody (likely one of the .NET groups
using C# (thought it wasn't yet named C# at that time)) who had
written code that called a method on a null pointer, but they didn’t
get an exception because the method didn’t access any fields (ie
“this” was null, but nothing in the method used it). That method then
called another method which did use the this point and threw an
exception, and a bit of head-scratching ensued. After they figured it
out, they sent us a note about it.
We thought that being able to call a method on a null instance was a
bit weird. Peter Golde did some testing to see what the perf impact
was of always using callvirt, and it was small enough that we decided
to make the change.
http://blogs.msdn.com/b/ericgu/archive/2008/07/02/why-does-c-always-use-callvirt.aspx
Because you're passing null into the firstArgument of Delegate.CreateDelegate
So you're calling an instance method on a null object.
http://msdn.microsoft.com/en-us/library/74x8f551.aspx
If firstArgument is a null reference and method is an instance method,
the result depends on the signatures of the delegate type type and of
method:
If the signature of type explicitly includes the hidden first
parameter of method, the delegate is said to represent an open
instance method. When the delegate is invoked, the first argument in
the argument list is passed to the hidden instance parameter of
method.
If the signatures of method and type match (that is, all parameter
types are compatible), then the delegate is said to be closed over a
null reference. Invoking the delegate is like calling an instance
method on a null instance, which is not a particularly useful thing to
do.
Sure you can call into a method if you are using the call IL instruction or the delegate approach. You will set this booby trap only off if you try to access member fields which will give you the NullReferenceException you did seek for.
try
int x;
public void Bar()
{
x = 1; // NullRefException
Debug.Assert(this == null);
}
The BCL does even contain explicit this == null checks to aid debugging for languages which do not use callvirt (like C#) all the time. See this question for further infos.
The String class for example has such checks. There is nothing mysterious about them except that you will not see the need for them in languages like C#.
// Determines whether two strings match.
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
public override bool Equals(Object obj)
{
//this is necessary to guard against reverse-pinvokes and
//other callers who do not use the callvirt instruction
if (this == null)
throw new NullReferenceException();
String str = obj as String;
if (str == null)
return false;
if (Object.ReferenceEquals(this, obj))
return true;
return EqualsHelper(this, str);
}
Try the documentation for Delegate.CreateDelegate() at msdn.
You're "manually" calling everything, and thus instead of passing an instance in for the this pointer, you're passing null. So it can happen, but you have to try really really hard.
this is a reference, so there is no problem with its being null from the perspective of the type system.
You may ask why NullReferenceException was not thrown. The full list of circumstances when CLR throws that exception is documented. Your case is not listed. Yes, it is a callvirt, but to Delegate.Invoke (see here) rather than to Bar, and so the this reference is actually your non-null delegate!
The behavior you see has an interesting implementational consequence for CLR. A delegate has a Target property (corresponds to your this reference) that is quite frequently null, namely when the delegate is static (imagine Bar be static). Now there is, naturally, a private backing field for the property, called _target. Does _target contain a null for a static delegate? No it doesn't. It contains a reference to the delegate itself. Why not null? Because a null is a legitimate target of a delegate as your example shows and CLR does not have two flavors of a null pointer to distinguish the static delegate somehow.
This bit of trivium demonstrates that with delegates, null targets of instance methods are no afterthought. You may still be asking the ultimate question: but why they had to be supported?
The early CLR had an ambitious plan of becoming, among others, the platform of choice even for sworn C++ developers, a goal that was approached first with Managed C++ and then with C++/CLI. Some too challenging language features were omitten, but there was nothing really challenging about supporting instance methods executing without an instance, which is perfectly normal in C++. Including delegate support.
The ultimate answer therefore is: because C# and CLR are two different worlds.
More good reading and even better reading to show the design allowing null instances shows its traces even in very natural C# syntactic contexts.
this is a readonly reference in C# classes. Accordingly and as expected this can be used like any other references (in read only mode) ...
this == null // readonly - possible
this = new this() // write - not possible
I'm familiar with the C# specification, section 5.3 which says that a variable has to be assigned before use.
In C and unmanaged C++ this makes sense as the stack isn't cleared and the memory location used for a pointer could be anywhere (leading to a hard-to-track-down bug).
But I am under the impression that there are not truly "unassigned" values allowed by the runtime. In particular that a reference type that is not initialized will always have a null value, never the value left over from a previous invocation of the method or random value.
Is this correct, or have I been mistakenly assuming that a check for null is sufficient all these years? Can you have truly unintialized variables in C#, or does the CLR take care of this and there's always some value set?
I am under the impression that there are not truly "unassigned" values allowed by the runtime. In particular that a reference type that is not initialized will always have a null value, never the value left over from a previous invocation of the method or random value. Is this correct?
I note that no one has actually answered your question yet.
The answer to the question you actually asked is "sorta".
As others have noted, some variables (array elements, fields, and so on) are classified as being automatically "initially assigned" to their default value (which is null for reference types, zero for numeric types, false for bools, and the natural recursion for user-defined structs).
Some variables are not classified as initially assigned; local variables in particular are not initially assigned. They must be classified by the compiler as "definitely assigned" at all points where their values are used.
Your question then is actually "is a local variable that is classified as not definitely assigned actually initially assigned the same way that a field would be?" And the answer to that question is yes, in practice, the runtime initially assigns all locals.
This has several nice properties. First, you can observe them in the debugger to be in their default state before their first assignment. Second, there is no chance that the garbage collector will be tricked into dereferencing a bad pointer just because there was garbage left on the stack that is now being treated as a managed reference. And so on.
The runtime is permitted to leave the initial state of locals as whatever garbage happened to be there if it can do so safely. But as an implementation detail, it does not ever choose to do so. It zeros out the memory for a local variable aggressively.
The reason then for the rule that locals must be definitely assigned before they are used is not to prevent you from observing the garbage uninitialized state of the local. That is already unobservable because the CLR aggressively clears locals to their default values, the same as it does for fields and array elements. The reason this is illegal in C# is because using an unassigned local has high likelihood of being a bug. We simply make it illegal, and then the compiler prevents you from ever having such a bug.
As far as I'm aware, every type has a designated default value.
As per this document, fields of classes are assigned the default value.
http://msdn.microsoft.com/en-us/library/aa645756(v=vs.71).aspx
This document says that the following always have default values assigned automatically.
Static variables.
Instance variables of class instances.
Instance variables of initially assigned struct variables.
Array elements.
Value parameters.
Reference parameters.
Variables declared in a catch clause or a foreach statement.
http://msdn.microsoft.com/en-us/library/aa691173(v=vs.71).aspx
More information on the actual default values here:
Default values of C# types (C# reference)
It depends on where the variable is declared. Variables declared within a class are automatically initialized using the default value.
object o;
void Method()
{
if (o == null)
{
// This will execute
}
}
Variables declared within a method are not initialized, but when the variable is first used the compiler checks to make sure that it was initialized, so the code will not compile.
void Method()
{
object o;
if (o == null) // Compile error on this line
{
}
}
In particular that a reference type that is not initialized will always have a null value
I think you are mixing up local variables and member variables. Section 5.3 talks specifically about local variables. Unlike member variables that do get defaulted, local variables never default to the null value or anything else: they simply must be assigned before they are first read. Section 5.3 explains the rules that the compiler uses to determine if a local variable has been assigned or not.
There are 3 ways that a variable can be assigned an initial value:
By default -- this happens (for example) if you declare a class variable without assigning an initial value, so the initial value gets default(type) where type is whatever type you declare the variable to be.
With an initializer -- this happens when you declare a variable with an initial value, as in int i = 12;
Any point before its value is retrieved -- this happens (for example) if you have a local variable with no initial value. The compiler ensures that you have no reachable code paths that will read the value of the variable before it is assigned.
At no point will the compiler allow you to read the value of a variable that hasn't been initialized, so you never have to worry about what would happen if you tried.
All primitive data types have default values, so there isn't any need to worry about them.
All reference types are initialized to null values, so if you leave your reference types uninitialized and then call some method or property on that null ref type, you would get a runtime exception which would need to be handled gracefully.
Again, all Nullable types need to be checked for null or default value if they are not initialized as follows:
int? num = null;
if (num.HasValue == true)
{
System.Console.WriteLine("num = " + num.Value);
}
else
{
System.Console.WriteLine("num = Null");
}
//y is set to zero
int y = num.GetValueOrDefault();
// num.Value throws an InvalidOperationException if num.HasValue is false
try
{
y = num.Value;
}
catch (System.InvalidOperationException e)
{
System.Console.WriteLine(e.Message);
}
But, you will not get any compile error if you leave all your variables uninitialized as the compiler won't complain. It's only the run-time you need to worry about.
I have the following line of code:
var dmrReceived = new DownloadMessagesReport();
StyleCop and ReSharper are suggesting I remove the redundant initializer. However if I replace it with
DownloadMessagesReport dmrReceived;
surely this will generate an object reference not set to an instance of an object? I am using .NET 3.5. Do you no longer manually have to instantiate objects?
Next line that follows is:
dmrReceived = dc.DownloadNewMessages(param, param2, param3);
It's worth noting that dc is a class generated from a WCF service. So DownloadNewMessages is a WCF web service method.
If it's a field, it will be automatically initialised to its default value - null for a reference type. Given the var however, I'm guessing it's not, and that you're actually instantiating it further down in your code anyway, thereby discarding the value you have instantiated here. You don't need to initialise a variable where it's declared. If you want to use var you do, but then I'd recommend you declare it where you actually first use it.
So your code is
var dmrReceived = new DownloadMessagesReport();
dmrReceived = dc.DownloadNewMessages(param, param2, param3);
The second line does not fill the object you created in the first line but it completely replaces that object. So the first assignment is not needed (as the first object is never used), which is what R# is warning about.
That will only generate an object reference error if dmrReceived is accessed before it is assigned. A lot of the times, the reason for resharper saying that an initializer is redundant is that the variable will always be assigned another value in every single possible execution path.
i.e.
DownloadMessagesReport dmrReceived;
...
if(condition) {
dmrReceived = new DownloadMessagesReport();
} else {
throw new Exception("oh no");
}
return dmrReceived.SomeProperty;
Accessing SomeProperty is the first place in the code where dmrReceived actually needs to have a value. As follows from the rest of the code, there's no way to get to that line of code without assigning it a value, therefore, the initial value that might have been assigned, would not be used in any execution path, and would thus be redundant.
"Do you no longer manually have to
instantiate objects?"
Of course you need to "manually" instantiate objects, how would the compiler know when or where to instantiate it otherwise?
A simple scenario is this:
MyType x;
if ( EverythingWorkedOut )
x = new MyType(params);
else
x = null;
If the compiler instantiated it the first time, it would be redundant and more overhead in all code.
Don't trust ReSharper or any other Computer-Intelligent-Stuff over your own Instincts! They're not always right you know.
Just a side note, you don't really need to do x = null; since it should be the default value of a non-instantiated object.
Supposing this is your code:
var dmrReceived = new DownloadMessagesReport();
dmrReceived = dc.DownloadNewMessages(param, param2, param3);
You are creating an instance of a DownloadMessagesReport in the first line. And then you throw this object away by assigning the dmrReceived variable another value returned from DownloadNewMessages method. The first new DownloadMessagesReport() object is redundant. You effectively creating garbage that Garbage Collector will have to clean at some point.
That's why ReSharper and StyleCop showing you warning.
If you can initialize variable with actual value right in the same line where the variable is declared then do it.
Surely this is enough?
DownloadMessagesReport dmrReceived = dc.DownloadNewMessages(param, param2, param3);