I understand that "out" are just like "ref" types, except that out variables do not have to be initialised. Are there any other uses of "out" parameters? Sometimes I see their use in callback methods but I never understood how they actually work or why we need them instead of global level ref variables?
out parameters enforce the contract between the caller and the callee (the function being called) by explicitly specifying that the callee will initialize them. On the other hand when using ref parameters all we know is that the callee could modify them but it is the caller's responsibility to initialize them.
One of the biggest examples is the TryParse methods, you want to be able to check if something can be converted, and usually if it can be converted you want the converted value. Otherwise its just another way to pass objects back to the calling method.
Why would you want to have to initialize something in the calling method, with no guarantee that the called method itself would overwrite the variable if the method completes normally? Those are the benefits that out parameters give you.
Basically I think of out parameters as "oops, I need to return more than one value" indicators. I'd prefer to use tuples myself, but of course they only made it into .NET 4... and without explicit language support they're slightly more awkward to use than would be ideal, too.
2 main differences are there
Unlike ref it doesn't expect the variable to be initialized.
when using OUT, called function is responsible of assigning the value not callee.
Related
I stumbled upon a method today. I'm talking about: Array.Initialize().
According to the documentation:
This method is designed to help compilers support value-type arrays; most users do not need this method.
How does this method is responsible for making the compiler support value types? As far as I'm concerned this method just:
Initializes every element of the value-type Array by calling the default constructor of the value type.
Also, why is it public? I don't see myself with the need of calling this method, compilers already initialize arrays when created, so manually calling this method will be redundant and useless.
Even when my intention would be resetting the values of an array, I would still not call it, I would create a new one. array = new int[].
So, it seems that this method exist just for the sake of the compiler. Why is this? Can anyone give me some more details?
It's worth noting that the rules of .NET are different to the rules of C#.
There are things we can do in .NET that we can't do in C#, generally either because the code is not verifiable (ref return types for example) or because they could introduce some confusion.
In C# structs cannot have a defined parameterless constructor, and calling new SomeValueType() works by creating a zero-filled portion of memory (all fields therefore being 0 for numeric types, null for reference types, and the result of this same rule again for other value-types).
In .NET you can have a parameterless constructor on a value type.
It's probably a bad idea to do so. For one thing the rules about just when it is called and just when the memory of the value is zero-filled, and what happens upon assignment in different cases aren't entirely simple (e.g. new SomeValueType() will call it but new T() in a generic method where T is SomeValueType will not!). Life is simpler if the result of new SomeValueType() will always be zero-filling. That no doubt influenced the design of C# not allowing this even though .NET does.
For this reason, Array.Initialize() will never make sense on new arrays of any type that was written in C#, because calling the constructor and zero-filling is the same thing.
But by the same token, it's possible for a type to be written in another .NET language (at the very least, you can do it in CIL) that does have a parameterless constructor that actually has an effect. And for that reason its possible that a compiler for such a language would want its equivalent to new SomeValueType[3] to call that constructor on all the types in the array. And therefore it's sensible to have a method in the framework that allows such a fill to be done, so that a compiler for such a language can make use of it.
Also, why is it public?
So it can be called by code produced by such a hypothetical constructor even in a context where security restrictions prevent it from calling private methods of another assembly.
For me myself it looks like the Initialize() method runs through the array and recreates the Value Types within. So with a new array you get a new empty array and so you get with Array.Clear(), but with Array.Initialize() you get an Array full of fresh created Value Types (types and length based on the old array).
And that should be all of the difference.
Based on the CLR source, the method traverses each index of the array and initializes the value type on that index by calling the default constructor, similar to initobj IL instruction (I wonder what happens when the constructor throws an exception, though). The method is public because calling a private method directly from IL would make it a bit unverifiable.
Today's C# compilers do not initialize each element of the array when creating it, simply "set" each index to the default value of the type. C# 6 introduces implementing default constructors for value types (which were already supported by CLR), so this is needed for languages with different array creation semantics.
You can see the expected use in the test code:
https://github.com/dotnet/coreclr/blob/3015ff7afb4936a1c5c5856daa4e3482e6b390a9/tests/src/CoreMangLib/cti/system/array/arrayinitialize.cs
Basically, it sets an array of non-intrinsic value-types back to their default(T) state.
It does not seem like an amazingly useful tool, but I can see how it could be useful for zero'ing out arrays of non-intrinsic value data.
I just want to check my understanding of C#'s ways of handling things, before I delve too deeply into designing my classes. My current understanding is that:
Struct is a value type, meaning it actually contains the data members defined within.
Class is a reference type, meaning it contains references to the data members defined within.
A method signature passes parameters by value, which means a copy of the value is passed to the inside of the method, making it expensive for large arrays and data structures.
A method signature that defines a parameter with the ref or out keywords will instead pass a parameter by reference, which means a pointer to the object is provided instead.
What I don't understand is what happens when I invoke a method, what actually happens. Does new() get invoked? Does it just automagically copy the data? Or does it actually just point to the original object? And how does using ref and out affect this?
What I don't understand is what happens when I invoke a method, what actually happens. Does new() get invoked? Does it just automagically copy the data? Or does it actually just point to the original object? And how does using ref and out affect this?
The short answer:
The empty constructor will not be called automatically, and it actually just points to the original object.
using ref and out does not affect this.
The long answer:
I think it would be easier to understand how C# handles passing arguments to a function.
Actually everything is being passed by value
Really?! Everything by value?
Yes! Everything!
Of course there must be some kind of a difference between passing classes and simple typed objects, such as an Integer, otherwise, it would be a huge step back performance wise.
Well the thing is, that behind the scenes when you pass a class instance of an object to a function, what is really being passed to the function is the pointer to the class. the pointer, of course, can be passed by value without causing performance issues.
Actually, everything is being passed by value; it's just that when
you're "passing an object", you're actually passing a reference to that
object (and you're passing that reference by value).
once we are in the function, given the argument pointer, we can relate to the object passed by reference.
You don't actually need to do anything for this, you can relate directly to the instance passed as the argument (as said before, this whole process is being done behind the scenes).
After understanding this, you probably understand that the empty constructor will not be called automatically, and it actually just points to the original object.
EDITED:
As to the out and ref, they allow functions to change the value of an arguments and have that change persist outside of the scope of the function.
In a nutshell, using the ref keyword for value types will act as follows:
int i = 42;
foo(ref i);
will translate in c++ to:
int i = 42;
int* ptrI = &i;
foo(ptrI)
while omitting the ref will simply translate to:
int i = 42;
foo(i)
using those keywords for reference type objects, will allow you to reallocate memory to the passed argument, and make the reallocation persist outside of the scope of the function (for more details please refer to the MSDN page)
Side note:
The difference between ref and out is that out makes sure that the called function must assign a value to the out argument, while ref does not have this restriction, and then you should handle it by assigning some default value yourself, thus, ref Implies the the initial value of the argument is important to the function and might affect it's behaviour.
Passing a value-type variable to a method means passing a copy of the variable to the method. Any changes to the parameter that take place inside the method have no affect on the original data stored in the variable.
If you want the called method to change the value of the parameter, you have to pass it by reference, using the ref or out keyword.
When you pass a reference-type parameter by value, it is possible to change the data pointed to by the reference, such as the value of a class member. However, you cannot change the value of the reference itself; that is, you cannot use the same reference to allocate memory for a new class and have it persist outside the block. To do that, pass the parameter using the ref (or out) keyword.
Reference: Passing Parameters(C#)
Tragically, there is no way to pass an object by value in C# or VB.NET. I suggest instead you pass, for example, New Class1(Object1) where Object1 is an instance of Class1. You will have to write your own New method to do this but at least you then have an easy pass-by-value capability for Class1.
I have a code like the following:
struct A
{
void SomeMethod()
{
var items = Enumerable.Range(0, 10).Where(i => i == _field);
}
int _field;
}
... and then i get the following compiler error:
Anonymous methods inside structs can not access instance members of 'this'.
Can anybody explains what's going on here.
Variables are captured by reference (even if they were actually value-types; boxing is done then).
However, this in a ValueType (struct) cannot be boxed, and hence you cannot capture it.
Eric Lippert has a nice article on the surprises of capturing ValueTypes. Let me find the link
The Truth About Value Types
Note in response to the comment by Chris Sinclair:
As a quick fix, you can store the struct in a local variable: A thisA = this; var items = Enumerable.Range(0, 10).Where(i => i == thisA._field); – Chris Sinclair 4 mins ago
Beware of the fact that this creates surprising situations: the identity of thisA is not the same as this. More explicitly, if you choose to keep the lambda around longer, it will have the boxed copy thisA captured by reference, and not the actual instance that SomeMethod was called on.
When you have an anonymous method it will be compiled into a new class, that class will have one method (the one you define). It will also have a reference to each variable that you used that was outside of the scope of the anonymous method. It's important to emphasize that it is a reference, not a copy, of that variable. "lambdas close over variables, not values" as the saying goes. This means that if you close over a variable outside of the scope of a lambda, and then change that variable after defining the anonymous method (but before invoking it) then you will see the changed value when you do invoke it).
So, what's the point of all of that. Well, if you were to close over this for a struct, which is a value type, it's possible for the lambda to outlive the struct. The anonymous method will be in a class, not a struct, so it will go on the heap, live as long as it needs to, and you are free to pass a reference to that class (directly or indirectly) wherever you want.
Now imagine that we have a local variable, with a struct of the type you've defined here. We use this named method to generate a lambda, and let's assume for a moment that the query items is returned (instead of the method being void). Would could then store that query in another instance (instead of local) variable, and iterate over that query some time later on another method. What would happen here? In essence, we would have held onto a reference to a value type that was on the stack once it is no longer in scope.
What does that mean? The answer is, we have no idea. (Please look over the link; it's kinda the crux of my argument.) The data could just happen to be the same, it could have been zeroed out, it could have been filled by entirely different objects, there is no way of knowing. C# goes to great lengths, as a language, to prevent you from doing things like this. Languages such as C or C++ don't try so hard to stop you from shooting your own foot.
Now, in this particular case, it's possible that you aren't going to use the lambda outside of the scope of what this refers to, but the compiler doesn't know that, and if it lets you create the lambda it has no way of determining whether or not you expose it in a way that could result in it outliving this, so the only way to prevent this problem is to disallow some cases that aren't actually problematic.
Pretty straight forward. MSDN states that you can use ref, but not out for partial methods. I'm just curious as to the why? It was my understanding that when code is compiled, the partials are merged, so what is up with the restriction? Is there more to partial than just making code files cleaner and organized (i.e. eyecandy)?
Reference: MSDN Article - "Partial methods can have ref but not out parameters."
You got to consider what happens if the partial method isn't implemented.
What happens then is that all calls to the method is just stripped out as though they never happened.
So for a method using out, it would look like this:
stream s;
GetStream(out s);
s.Write(...);
and be compiled as though it said this:
stream s;
s.Write(...);
This code is not allowed because s has not been initialized. The guarantee that the variable would be initialized by the time you try to call the Write method on it was tied up with the call to GetStream.
It is the same with methods returning data. Since the entire method call is just not compiled if you haven't implemented the partial method, you need to consider what you can and cannot do and still leave the code that calls it valid. In terms of out and return values, it has the potential of leaving the calling code invalid or incomplete, so it is not allowed.
As for ref, that is valid since the initialization has been taken care of by the calling code:
stream s = null;
GetStream(ref s); // may be stripped out
if (s != null)
s.Write(...);
Because unlike ref parameters, out parameters MUST be initialized before the method returns. If the partial method is not implemented (which is a valid scenario,) how can it be initialized?
My guess would be because out parameters don't need to be initialized whereas ref parameters do.
If you used an out parameter on a partial method, how could C# verify that the parameter was initialized or not?
An out parameter suggests that you want a value out of the method. If the method doesn't exist, it can't provide that value.
The alternative would be to set the variable's value explicitly to its default value (0, null etc) instead of executing the method call. That way the variable would still be definitely initialized - although the default value may not be a terribly useful one. I believe the C# team have considered this - it may even make it into a future version, who knows? Personally I doubt that it would be particularly useful, but the possibility is there.
For the moment, you could always use a ref parameter instead, and just initialize the variable manually before the call to whatever the default value should be.
I would assume the reason is because a partial method with only a signature (i.e. no implementation) is still valid. If you had an out parameter an implementation-less method would always cause an error (as there's nothing assigning the out value)
A partial method is split across partial classes. A method is required to assign a value to an OUT parameter. Partial methods may or may not be implemented. It would mean multiple code chunks is trying to assign value to the OUT parameter.
As everyone else has stated out params must be assigned. To add this will generate compiler error CS0177 ref on the other hand must be assigned prior to making the call.
As far as I can tell, the only use for out parameters is that a caller can obtain multiple return values from a single method invocation. But we can also obtain multiple result values using ref parameters instead!
So are there other situations where out parameters could prove useful and where we couldn't use ref parameters instead?
Thank you.
Yes - the difference between ref and out is in terms of definite assignment:
An out parameter doesn't have to be definitely assigned by the caller before the method call. It does have to be definitely assigned in the method before it returns normally (i.e. without an exception). The variable is then definitely assigned in the caller after the call.
A ref parameter does have to be definitely assigned by the caller before the method call. It doesn't have to be assigned a different value in the method.
So suppose we wanted to change int.TryParse(string, out int) to use ref instead. Usually the calling code looks like this:
int value;
if (int.TryParse(text, out value))
{
// Use value
}
else
{
// Do something else
}
Now if we used ref, we'd have to give value a value before the call, e.g.:
int value = 0;
if (int.TryParse(text, ref value))
{
// Use value
}
else
{
// Do something else
}
Obviously it's not a huge difference - but it gives the wrong impression. We're assigning a value that we have no intention of ever using, and that's not a good thing for readability. An out parameter indicates that a value will come out of the method (assuming there's no exception) and that you don't need to have a value to start with.
Once of the suggestions I've made for C# 5 (I've no idea if it'll be taken up or not) is that a method with an out parameter should be able to regarded as a method returning a tuple of values. Combined with better support for tuples, that would mean we could do something like this:
var (ok, value) = int.TryParse(text);
In this case ok and value would be implicitly typed to bool and int respectively. That way it's clear what's going into the method (text) and what's coming out (two pieces of information: ok and value).
That would simply not be available if int.TryParse used a ref parameter instead - as the compiler can't know whether it's going to actually care about the initial value of the ref parameter.
You can look at parameters in this way:
normal parameters are in parameters: A value can go into the function through such a parameter; therefore it must be initialized.
ref parameters are in-out parameters: A value can go into and out of a function through such a parameter. Because of the former, it must also be initialized.
out parameters are out parameters: A value is only supposed to come back from a function through such a parameter; therefore, it doesn't need to be initialized.
I came up with this way of looking at ref/out parameters by studying Microsoft's COM technology. IDL (interface description language) is used to describe COM component interfaces, and with IDL, parameters are augmented with in, out, and inout declarators. I suspect .NET and C# have partly inherited these declarators from COM, albeit with slightly different names (ref instead of inout).
With COM, out parameters are frequently used to retrieve an interface method's actual return value, since the "real" return value is often already used for returning a HRESULT success/error code.
With .NET, I think out parameters have far less importance, even in cases where you want to return several values from a method (you could return complex objects or Tuples in these situations).
One important difference is this:
A variable passed as an out argument
need not be initialized. However, the
out parameter must be assigned a value
before the method returns.
(A ref parameter does not require this)
Source: http://msdn.microsoft.com/en-us/library/t3c3bfhx(VS.71).aspx
An out parameter is useful when you want multiple result values from a method. Technically, you could use a ref parameter to achieve the same goal but an out parameter does a significantly better job at conveying intent. When you use ref, It is not clear why you are doing so instead of using out or instead of using the function result. Presumably, you intend on changing the value passed, but why you are changing it isn't clear simply from the function signature.
I think a fine example is int.TryParse()
http://msdn.microsoft.com/en-us/library/f02979c7.aspx
The primary reason that out is better than ref is that you don't need to assign a dummy value to the return var before calling (even implicitly).
So out tells you, and the compiler: "This var will be assigned within the method. And the var's initial value, if any, will not even be looked at."
Major difference between the two is that if we are using ref then we have to initialize this before call and it is optional that we assign a value to our ref variable in our method.
However for out methods we do not have to explicitly initialize them but in our method we have to assign some value to it, otherwise they will generate compile time error.