I just want to check my understanding of C#'s ways of handling things, before I delve too deeply into designing my classes. My current understanding is that:
Struct is a value type, meaning it actually contains the data members defined within.
Class is a reference type, meaning it contains references to the data members defined within.
A method signature passes parameters by value, which means a copy of the value is passed to the inside of the method, making it expensive for large arrays and data structures.
A method signature that defines a parameter with the ref or out keywords will instead pass a parameter by reference, which means a pointer to the object is provided instead.
What I don't understand is what happens when I invoke a method, what actually happens. Does new() get invoked? Does it just automagically copy the data? Or does it actually just point to the original object? And how does using ref and out affect this?
What I don't understand is what happens when I invoke a method, what actually happens. Does new() get invoked? Does it just automagically copy the data? Or does it actually just point to the original object? And how does using ref and out affect this?
The short answer:
The empty constructor will not be called automatically, and it actually just points to the original object.
using ref and out does not affect this.
The long answer:
I think it would be easier to understand how C# handles passing arguments to a function.
Actually everything is being passed by value
Really?! Everything by value?
Yes! Everything!
Of course there must be some kind of a difference between passing classes and simple typed objects, such as an Integer, otherwise, it would be a huge step back performance wise.
Well the thing is, that behind the scenes when you pass a class instance of an object to a function, what is really being passed to the function is the pointer to the class. the pointer, of course, can be passed by value without causing performance issues.
Actually, everything is being passed by value; it's just that when
you're "passing an object", you're actually passing a reference to that
object (and you're passing that reference by value).
once we are in the function, given the argument pointer, we can relate to the object passed by reference.
You don't actually need to do anything for this, you can relate directly to the instance passed as the argument (as said before, this whole process is being done behind the scenes).
After understanding this, you probably understand that the empty constructor will not be called automatically, and it actually just points to the original object.
EDITED:
As to the out and ref, they allow functions to change the value of an arguments and have that change persist outside of the scope of the function.
In a nutshell, using the ref keyword for value types will act as follows:
int i = 42;
foo(ref i);
will translate in c++ to:
int i = 42;
int* ptrI = &i;
foo(ptrI)
while omitting the ref will simply translate to:
int i = 42;
foo(i)
using those keywords for reference type objects, will allow you to reallocate memory to the passed argument, and make the reallocation persist outside of the scope of the function (for more details please refer to the MSDN page)
Side note:
The difference between ref and out is that out makes sure that the called function must assign a value to the out argument, while ref does not have this restriction, and then you should handle it by assigning some default value yourself, thus, ref Implies the the initial value of the argument is important to the function and might affect it's behaviour.
Passing a value-type variable to a method means passing a copy of the variable to the method. Any changes to the parameter that take place inside the method have no affect on the original data stored in the variable.
If you want the called method to change the value of the parameter, you have to pass it by reference, using the ref or out keyword.
When you pass a reference-type parameter by value, it is possible to change the data pointed to by the reference, such as the value of a class member. However, you cannot change the value of the reference itself; that is, you cannot use the same reference to allocate memory for a new class and have it persist outside the block. To do that, pass the parameter using the ref (or out) keyword.
Reference: Passing Parameters(C#)
Tragically, there is no way to pass an object by value in C# or VB.NET. I suggest instead you pass, for example, New Class1(Object1) where Object1 is an instance of Class1. You will have to write your own New method to do this but at least you then have an easy pass-by-value capability for Class1.
Related
I have a object that is my in memory state of the program and also have some other worker functions that I pass the object to to modify the state. I have been passing it by ref to the worker functions. However I came across the following function.
byte[] received_s = new byte[2048];
IPEndPoint tmpIpEndPoint = new IPEndPoint(IPAddress.Any, UdpPort_msg);
EndPoint remoteEP = (tmpIpEndPoint);
int sz = soUdp_msg.ReceiveFrom(received_s, ref remoteEP);
It confuses me because both received_s and remoteEP are returning stuff from the function. Why does remoteEP need a ref and received_s does not?
I am also a c programmer so I am having a problem getting pointers out of my head.
Edit:
It looks like that objects in C# are pointers to the object under the hood. So when you pass an object to a function you can then modify the object contents through the pointer and the only thing passed to the function is the pointer to the object so the object itself is not being copied. You use ref or out if you want to be able to switch out or create a new object in the function which is like a double pointer.
Short answer: read my article on argument passing.
Long answer: when a reference type parameter is passed by value, only the reference is passed, not a copy of the object. This is like passing a pointer (by value) in C or C++. Changes to the value of the parameter itself won't be seen by the caller, but changes in the object which the reference points to will be seen.
When a parameter (of any kind) is passed by reference, that means that any changes to the parameter are seen by the caller - changes to the parameter are changes to the variable.
The article explains all of this in more detail, of course :)
Useful answer: you almost never need to use ref/out. It's basically a way of getting another return value, and should usually be avoided precisely because it means the method's probably trying to do too much. That's not always the case (TryParse etc are the canonical examples of reasonable use of out) but using ref/out should be a relative rarity.
Think of a non-ref parameter as being a pointer, and a ref parameter as a double pointer. This helped me the most.
You should almost never pass values by ref. I suspect that if it wasn't for interop concerns, the .Net team would never have included it in the original specification. The OO way of dealing with most problem that ref parameters solve is to:
For multiple return values
Create structs that represent the multiple return values
For primitives that change in a method as the result of the method call (method has side-effects on primitive parameters)
Implement the method in an object as an instance method and manipulate the object's state (not the parameters) as part of the method call
Use the multiple return value solution and merge the return values to your state
Create an object that contains state that can be manipulated by a method and pass that object as the parameter, and not the primitives themselves.
You could probably write an entire C# app and never pass any objects/structs by ref.
I had a professor who told me this:
The only place you'd use refs is where you either:
Want to pass a large object (ie, the objects/struct has
objects/structs inside it to multiple levels) and copying it would
be expensive and
You are calling a Framework, Windows API or other API that requires
it.
Don't do it just because you can. You can get bit in the ass by some
nasty bugs if you start changing the values in a param and aren't
paying attention.
I agree with his advice, and in my five plus years since school, I've never had a need for it outside of calling the Framework or Windows API.
Since received_s is an array, you're passing a pointer to that array. The function manipulates that existing data in place, not changing the underlying location or pointer. The ref keyword signifies that you're passing the actual pointer to the location and updating that pointer in the outside function, so the value in the outside function will change.
E.g. the byte array is a pointer to the same memory before and after, the memory has just been updated.
The Endpoint reference is actually updating the pointer to the Endpoint in the outside function to a new instance generated inside the function.
Think of a ref as meaning you are passing a pointer by reference. Not using a ref means you are passing a pointer by value.
Better yet, ignore what I just said (it's probably misleading, especially with value types) and read This MSDN page.
While I agree with Jon Skeet's answer overall and some of the other answers, there is a use case for using ref, and that is for tightening up performance optimizations. It has been observed during performance profiling that setting the return value of a method has slight performance implications, whereas using ref as an argument whereby the return value is populated into that parameter results in this slight bottleneck being removed.
This is really only useful where optimization efforts are taken to extreme levels, sacrificing readability and perhaps testability and maintainability for saving milliseconds or perhaps split-milliseconds.
my understanding is that all objects derived from Object class are passed as pointers whereas ordinary types (int, struct) are not passed as pointers and require ref. I am nor sure about string (is it ultimately derived from Object class ?)
Ground zero rule first, Primitives are passed by value(stack) and Non-Primitive by reference(Heap) in the context of TYPES involved.
Parameters involved are passed by Value by default.
Good post which explain things in details.
http://yoda.arachsys.com/csharp/parameters.html
Student myStudent = new Student {Name="A",RollNo=1};
ChangeName(myStudent);
static void ChangeName(Student s1)
{
s1.Name = "Z"; // myStudent.Name will also change from A to Z
// {AS s1 and myStudent both refers to same Heap(Memory)
//Student being the non-Primitive type
}
ChangeNameVersion2(ref myStudent);
static void ChangeNameVersion2(ref Student s1)
{
s1.Name = "Z"; // Not any difference {same as **ChangeName**}
}
static void ChangeNameVersion3(ref Student s1)
{
s1 = new Student{Name="Champ"};
// reference(myStudent) will also point toward this new Object having new memory
// previous mystudent memory will be released as it is not pointed by any object
}
We can say(with warning) Non-primitive types are nothing but Pointers
And when we pass them by ref we can say we are passing Double Pointer
I know this question is old, but I would to brief the answer for any new curious newbie developers.
Try as much as you can to avoid using ref/out, it is against writing a clean code and bad quality code.
Passing types by reference (using out or ref) requires experience with pointers, understanding how value types and reference types differ and handling methods that have multiple return values. Also, the difference between out and ref parameters is not widely understood.
please check
https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1045
I understand that ref means the reference submitted may point to an entirely different object when the method returns.
However what I like about the ref modifier is that the developer immediately knows what he put in may be somehow different by the time the method returns, as the ref modifier is also required caller side.
Taking a simple method, from a hypothetical ORM:
public Boolean AddItem(Entity someEntity)
{
try
{
// Add item to database
// Get Id of entity back from database
someEntity.Id = *returnedId*;
return true;
}
catch (DBException ex)
{
return false;
}
}
Any caller of the method may not know that their entity was updated by calling the method. However making someEntity a ref parameter, it indicates to the developer that their submitted parameter will be different, they then know to dig into the documentation/code to find out how it is changed, without the modifier they may have never thought to do so.
I know this is slightly abusing the ref modifier, as it is not actually required in the example above, but will using it this way actually cause me any problems?
I think it is abuse. Assuming Entity is a reference type, the parameter declaration
bool AddItem(ref Entity someEntity)
indicates that the method might move the reference someEntity to "point to" an entirely different object instance.
If all you do is mutate the existing instance someEntity, do not write ref. Instead, use names (method name and parameter name) and documentation that make it clear you will "mutate" (change) the object.
Example name (you can choose better names since you know the actual code): AddItemAndUpdateEntityID
Consequences of using ref:
the caller must use a variable. He cannot use the return value from a property, method call or expression evaluation
the caller must use the exact type, he cannot pass a SpecificEntity, say, where SpecificEntity derives from Entity
the logic of your method must be prepared that other threads (or other methods you call yourself) may change the identity of the ref parameter. For example, if you check if someEntity == null in the top of your method, at a later point in your method that might have changed because someone else might have moved the reference to point elsewhere.
However what I like about the ref modifier is that the developer immediately knows what he put in may be somehow different by the time the method returns
No they don't.
What they know about the ref modifier is that the parameter may actually refer to something else when the method returns.
Changing a method that accepts a reference type to use ref solely so that you can give a false impression is not useful in any way.
Of course there's also the flip side; as well as abusing ref to indicate something it doesn't mean, you've lost the ability of ref to indicate what it does mean; one would have to examine the code to see if the method was actually using ref and wouldn't otherwise know from one method call to another whether you were still dealing with the same object.
Pretty straight forward. MSDN states that you can use ref, but not out for partial methods. I'm just curious as to the why? It was my understanding that when code is compiled, the partials are merged, so what is up with the restriction? Is there more to partial than just making code files cleaner and organized (i.e. eyecandy)?
Reference: MSDN Article - "Partial methods can have ref but not out parameters."
You got to consider what happens if the partial method isn't implemented.
What happens then is that all calls to the method is just stripped out as though they never happened.
So for a method using out, it would look like this:
stream s;
GetStream(out s);
s.Write(...);
and be compiled as though it said this:
stream s;
s.Write(...);
This code is not allowed because s has not been initialized. The guarantee that the variable would be initialized by the time you try to call the Write method on it was tied up with the call to GetStream.
It is the same with methods returning data. Since the entire method call is just not compiled if you haven't implemented the partial method, you need to consider what you can and cannot do and still leave the code that calls it valid. In terms of out and return values, it has the potential of leaving the calling code invalid or incomplete, so it is not allowed.
As for ref, that is valid since the initialization has been taken care of by the calling code:
stream s = null;
GetStream(ref s); // may be stripped out
if (s != null)
s.Write(...);
Because unlike ref parameters, out parameters MUST be initialized before the method returns. If the partial method is not implemented (which is a valid scenario,) how can it be initialized?
My guess would be because out parameters don't need to be initialized whereas ref parameters do.
If you used an out parameter on a partial method, how could C# verify that the parameter was initialized or not?
An out parameter suggests that you want a value out of the method. If the method doesn't exist, it can't provide that value.
The alternative would be to set the variable's value explicitly to its default value (0, null etc) instead of executing the method call. That way the variable would still be definitely initialized - although the default value may not be a terribly useful one. I believe the C# team have considered this - it may even make it into a future version, who knows? Personally I doubt that it would be particularly useful, but the possibility is there.
For the moment, you could always use a ref parameter instead, and just initialize the variable manually before the call to whatever the default value should be.
I would assume the reason is because a partial method with only a signature (i.e. no implementation) is still valid. If you had an out parameter an implementation-less method would always cause an error (as there's nothing assigning the out value)
A partial method is split across partial classes. A method is required to assign a value to an OUT parameter. Partial methods may or may not be implemented. It would mean multiple code chunks is trying to assign value to the OUT parameter.
As everyone else has stated out params must be assigned. To add this will generate compiler error CS0177 ref on the other hand must be assigned prior to making the call.
As far as I can tell, the only use for out parameters is that a caller can obtain multiple return values from a single method invocation. But we can also obtain multiple result values using ref parameters instead!
So are there other situations where out parameters could prove useful and where we couldn't use ref parameters instead?
Thank you.
Yes - the difference between ref and out is in terms of definite assignment:
An out parameter doesn't have to be definitely assigned by the caller before the method call. It does have to be definitely assigned in the method before it returns normally (i.e. without an exception). The variable is then definitely assigned in the caller after the call.
A ref parameter does have to be definitely assigned by the caller before the method call. It doesn't have to be assigned a different value in the method.
So suppose we wanted to change int.TryParse(string, out int) to use ref instead. Usually the calling code looks like this:
int value;
if (int.TryParse(text, out value))
{
// Use value
}
else
{
// Do something else
}
Now if we used ref, we'd have to give value a value before the call, e.g.:
int value = 0;
if (int.TryParse(text, ref value))
{
// Use value
}
else
{
// Do something else
}
Obviously it's not a huge difference - but it gives the wrong impression. We're assigning a value that we have no intention of ever using, and that's not a good thing for readability. An out parameter indicates that a value will come out of the method (assuming there's no exception) and that you don't need to have a value to start with.
Once of the suggestions I've made for C# 5 (I've no idea if it'll be taken up or not) is that a method with an out parameter should be able to regarded as a method returning a tuple of values. Combined with better support for tuples, that would mean we could do something like this:
var (ok, value) = int.TryParse(text);
In this case ok and value would be implicitly typed to bool and int respectively. That way it's clear what's going into the method (text) and what's coming out (two pieces of information: ok and value).
That would simply not be available if int.TryParse used a ref parameter instead - as the compiler can't know whether it's going to actually care about the initial value of the ref parameter.
You can look at parameters in this way:
normal parameters are in parameters: A value can go into the function through such a parameter; therefore it must be initialized.
ref parameters are in-out parameters: A value can go into and out of a function through such a parameter. Because of the former, it must also be initialized.
out parameters are out parameters: A value is only supposed to come back from a function through such a parameter; therefore, it doesn't need to be initialized.
I came up with this way of looking at ref/out parameters by studying Microsoft's COM technology. IDL (interface description language) is used to describe COM component interfaces, and with IDL, parameters are augmented with in, out, and inout declarators. I suspect .NET and C# have partly inherited these declarators from COM, albeit with slightly different names (ref instead of inout).
With COM, out parameters are frequently used to retrieve an interface method's actual return value, since the "real" return value is often already used for returning a HRESULT success/error code.
With .NET, I think out parameters have far less importance, even in cases where you want to return several values from a method (you could return complex objects or Tuples in these situations).
One important difference is this:
A variable passed as an out argument
need not be initialized. However, the
out parameter must be assigned a value
before the method returns.
(A ref parameter does not require this)
Source: http://msdn.microsoft.com/en-us/library/t3c3bfhx(VS.71).aspx
An out parameter is useful when you want multiple result values from a method. Technically, you could use a ref parameter to achieve the same goal but an out parameter does a significantly better job at conveying intent. When you use ref, It is not clear why you are doing so instead of using out or instead of using the function result. Presumably, you intend on changing the value passed, but why you are changing it isn't clear simply from the function signature.
I think a fine example is int.TryParse()
http://msdn.microsoft.com/en-us/library/f02979c7.aspx
The primary reason that out is better than ref is that you don't need to assign a dummy value to the return var before calling (even implicitly).
So out tells you, and the compiler: "This var will be assigned within the method. And the var's initial value, if any, will not even be looked at."
Major difference between the two is that if we are using ref then we have to initialize this before call and it is optional that we assign a value to our ref variable in our method.
However for out methods we do not have to explicitly initialize them but in our method we have to assign some value to it, otherwise they will generate compile time error.
Common question but I could use an "english" explanation.
Is it like Java where
Cat myCat
actually is a pointer to Cat?
Should I really create copy constructors in C#?
I understand we are passing by value, but now my question is are we passing by pointer value or full copy of the object?
If it's the latter, isn't that too expensive performance/memory wise? Is that when you have to use the ref keyword?
As #rstevens answered, if it is a class, myCat is a reference. But if you pass myCat to a method call, then the reference itself is passed by value - i.e. the parameter itself will reference the same object, but it's a completely new reference, so if you assign it to null, or create a new object, the old myCat reference will still point to the original object.
SomeMethod(myCat);
void SomeMethod(Cat cat)
{
cat.Miau(); //will make the original myCat object to miau
cat = null; //only cat is set to null, myCat still points to the original object
}
Jon Skeet has a good article about it.
Remember that a pointer is not exactly the same as a reference, but you can just about think of it that way if you want.
I swear I saw another SO question on this not 10 minutes ago, but I can't find the link now. In the other question I saw, they were talking about passing arguments by ref vs by value, and it came down to this:
By default in .Net, you don't pass objects by reference. You pass references to objects by value.
The difference is subtle but important, especially if, for example, you want to assign to your passed object in the method.
If you delacred Cat as
class Cat {...}
then it is.
If you delcared Cat as
struct Cat {...}
then your variable "is" the structure itself.
This is the difference between reference types and value types in .Net.
Yes, it's about pointers but not really... The thing that messed me up originally is that it isn't really about about protecting your variable from changes within the method. If you change the object within the method, those changes are visible to the external methods regardless of whether it is passed in "ref" or not.
The difference (as I understand it) is whether the variable you send in has its reference updated coming back out if you change the object that variable references. So given this method
public void DoSomething(ref CoolShades glasses)
{
glasses.Vendor = "Ray Ban";
glasses = new CoolShades();
}
the variable you passed in as a parameter now contains a reference to the new CoolShades rather than whatever object it referenced before. The original parameter object's Vendor property will be changed to "Ray Ban" regardless of whether you passed the parameter ref or not.