I understand that ref means the reference submitted may point to an entirely different object when the method returns.
However what I like about the ref modifier is that the developer immediately knows what he put in may be somehow different by the time the method returns, as the ref modifier is also required caller side.
Taking a simple method, from a hypothetical ORM:
public Boolean AddItem(Entity someEntity)
{
try
{
// Add item to database
// Get Id of entity back from database
someEntity.Id = *returnedId*;
return true;
}
catch (DBException ex)
{
return false;
}
}
Any caller of the method may not know that their entity was updated by calling the method. However making someEntity a ref parameter, it indicates to the developer that their submitted parameter will be different, they then know to dig into the documentation/code to find out how it is changed, without the modifier they may have never thought to do so.
I know this is slightly abusing the ref modifier, as it is not actually required in the example above, but will using it this way actually cause me any problems?
I think it is abuse. Assuming Entity is a reference type, the parameter declaration
bool AddItem(ref Entity someEntity)
indicates that the method might move the reference someEntity to "point to" an entirely different object instance.
If all you do is mutate the existing instance someEntity, do not write ref. Instead, use names (method name and parameter name) and documentation that make it clear you will "mutate" (change) the object.
Example name (you can choose better names since you know the actual code): AddItemAndUpdateEntityID
Consequences of using ref:
the caller must use a variable. He cannot use the return value from a property, method call or expression evaluation
the caller must use the exact type, he cannot pass a SpecificEntity, say, where SpecificEntity derives from Entity
the logic of your method must be prepared that other threads (or other methods you call yourself) may change the identity of the ref parameter. For example, if you check if someEntity == null in the top of your method, at a later point in your method that might have changed because someone else might have moved the reference to point elsewhere.
However what I like about the ref modifier is that the developer immediately knows what he put in may be somehow different by the time the method returns
No they don't.
What they know about the ref modifier is that the parameter may actually refer to something else when the method returns.
Changing a method that accepts a reference type to use ref solely so that you can give a false impression is not useful in any way.
Of course there's also the flip side; as well as abusing ref to indicate something it doesn't mean, you've lost the ability of ref to indicate what it does mean; one would have to examine the code to see if the method was actually using ref and wouldn't otherwise know from one method call to another whether you were still dealing with the same object.
Related
I just want to check my understanding of C#'s ways of handling things, before I delve too deeply into designing my classes. My current understanding is that:
Struct is a value type, meaning it actually contains the data members defined within.
Class is a reference type, meaning it contains references to the data members defined within.
A method signature passes parameters by value, which means a copy of the value is passed to the inside of the method, making it expensive for large arrays and data structures.
A method signature that defines a parameter with the ref or out keywords will instead pass a parameter by reference, which means a pointer to the object is provided instead.
What I don't understand is what happens when I invoke a method, what actually happens. Does new() get invoked? Does it just automagically copy the data? Or does it actually just point to the original object? And how does using ref and out affect this?
What I don't understand is what happens when I invoke a method, what actually happens. Does new() get invoked? Does it just automagically copy the data? Or does it actually just point to the original object? And how does using ref and out affect this?
The short answer:
The empty constructor will not be called automatically, and it actually just points to the original object.
using ref and out does not affect this.
The long answer:
I think it would be easier to understand how C# handles passing arguments to a function.
Actually everything is being passed by value
Really?! Everything by value?
Yes! Everything!
Of course there must be some kind of a difference between passing classes and simple typed objects, such as an Integer, otherwise, it would be a huge step back performance wise.
Well the thing is, that behind the scenes when you pass a class instance of an object to a function, what is really being passed to the function is the pointer to the class. the pointer, of course, can be passed by value without causing performance issues.
Actually, everything is being passed by value; it's just that when
you're "passing an object", you're actually passing a reference to that
object (and you're passing that reference by value).
once we are in the function, given the argument pointer, we can relate to the object passed by reference.
You don't actually need to do anything for this, you can relate directly to the instance passed as the argument (as said before, this whole process is being done behind the scenes).
After understanding this, you probably understand that the empty constructor will not be called automatically, and it actually just points to the original object.
EDITED:
As to the out and ref, they allow functions to change the value of an arguments and have that change persist outside of the scope of the function.
In a nutshell, using the ref keyword for value types will act as follows:
int i = 42;
foo(ref i);
will translate in c++ to:
int i = 42;
int* ptrI = &i;
foo(ptrI)
while omitting the ref will simply translate to:
int i = 42;
foo(i)
using those keywords for reference type objects, will allow you to reallocate memory to the passed argument, and make the reallocation persist outside of the scope of the function (for more details please refer to the MSDN page)
Side note:
The difference between ref and out is that out makes sure that the called function must assign a value to the out argument, while ref does not have this restriction, and then you should handle it by assigning some default value yourself, thus, ref Implies the the initial value of the argument is important to the function and might affect it's behaviour.
Passing a value-type variable to a method means passing a copy of the variable to the method. Any changes to the parameter that take place inside the method have no affect on the original data stored in the variable.
If you want the called method to change the value of the parameter, you have to pass it by reference, using the ref or out keyword.
When you pass a reference-type parameter by value, it is possible to change the data pointed to by the reference, such as the value of a class member. However, you cannot change the value of the reference itself; that is, you cannot use the same reference to allocate memory for a new class and have it persist outside the block. To do that, pass the parameter using the ref (or out) keyword.
Reference: Passing Parameters(C#)
Tragically, there is no way to pass an object by value in C# or VB.NET. I suggest instead you pass, for example, New Class1(Object1) where Object1 is an instance of Class1. You will have to write your own New method to do this but at least you then have an easy pass-by-value capability for Class1.
Pretty straight forward. MSDN states that you can use ref, but not out for partial methods. I'm just curious as to the why? It was my understanding that when code is compiled, the partials are merged, so what is up with the restriction? Is there more to partial than just making code files cleaner and organized (i.e. eyecandy)?
Reference: MSDN Article - "Partial methods can have ref but not out parameters."
You got to consider what happens if the partial method isn't implemented.
What happens then is that all calls to the method is just stripped out as though they never happened.
So for a method using out, it would look like this:
stream s;
GetStream(out s);
s.Write(...);
and be compiled as though it said this:
stream s;
s.Write(...);
This code is not allowed because s has not been initialized. The guarantee that the variable would be initialized by the time you try to call the Write method on it was tied up with the call to GetStream.
It is the same with methods returning data. Since the entire method call is just not compiled if you haven't implemented the partial method, you need to consider what you can and cannot do and still leave the code that calls it valid. In terms of out and return values, it has the potential of leaving the calling code invalid or incomplete, so it is not allowed.
As for ref, that is valid since the initialization has been taken care of by the calling code:
stream s = null;
GetStream(ref s); // may be stripped out
if (s != null)
s.Write(...);
Because unlike ref parameters, out parameters MUST be initialized before the method returns. If the partial method is not implemented (which is a valid scenario,) how can it be initialized?
My guess would be because out parameters don't need to be initialized whereas ref parameters do.
If you used an out parameter on a partial method, how could C# verify that the parameter was initialized or not?
An out parameter suggests that you want a value out of the method. If the method doesn't exist, it can't provide that value.
The alternative would be to set the variable's value explicitly to its default value (0, null etc) instead of executing the method call. That way the variable would still be definitely initialized - although the default value may not be a terribly useful one. I believe the C# team have considered this - it may even make it into a future version, who knows? Personally I doubt that it would be particularly useful, but the possibility is there.
For the moment, you could always use a ref parameter instead, and just initialize the variable manually before the call to whatever the default value should be.
I would assume the reason is because a partial method with only a signature (i.e. no implementation) is still valid. If you had an out parameter an implementation-less method would always cause an error (as there's nothing assigning the out value)
A partial method is split across partial classes. A method is required to assign a value to an OUT parameter. Partial methods may or may not be implemented. It would mean multiple code chunks is trying to assign value to the OUT parameter.
As everyone else has stated out params must be assigned. To add this will generate compiler error CS0177 ref on the other hand must be assigned prior to making the call.
As far as I can tell, the only use for out parameters is that a caller can obtain multiple return values from a single method invocation. But we can also obtain multiple result values using ref parameters instead!
So are there other situations where out parameters could prove useful and where we couldn't use ref parameters instead?
Thank you.
Yes - the difference between ref and out is in terms of definite assignment:
An out parameter doesn't have to be definitely assigned by the caller before the method call. It does have to be definitely assigned in the method before it returns normally (i.e. without an exception). The variable is then definitely assigned in the caller after the call.
A ref parameter does have to be definitely assigned by the caller before the method call. It doesn't have to be assigned a different value in the method.
So suppose we wanted to change int.TryParse(string, out int) to use ref instead. Usually the calling code looks like this:
int value;
if (int.TryParse(text, out value))
{
// Use value
}
else
{
// Do something else
}
Now if we used ref, we'd have to give value a value before the call, e.g.:
int value = 0;
if (int.TryParse(text, ref value))
{
// Use value
}
else
{
// Do something else
}
Obviously it's not a huge difference - but it gives the wrong impression. We're assigning a value that we have no intention of ever using, and that's not a good thing for readability. An out parameter indicates that a value will come out of the method (assuming there's no exception) and that you don't need to have a value to start with.
Once of the suggestions I've made for C# 5 (I've no idea if it'll be taken up or not) is that a method with an out parameter should be able to regarded as a method returning a tuple of values. Combined with better support for tuples, that would mean we could do something like this:
var (ok, value) = int.TryParse(text);
In this case ok and value would be implicitly typed to bool and int respectively. That way it's clear what's going into the method (text) and what's coming out (two pieces of information: ok and value).
That would simply not be available if int.TryParse used a ref parameter instead - as the compiler can't know whether it's going to actually care about the initial value of the ref parameter.
You can look at parameters in this way:
normal parameters are in parameters: A value can go into the function through such a parameter; therefore it must be initialized.
ref parameters are in-out parameters: A value can go into and out of a function through such a parameter. Because of the former, it must also be initialized.
out parameters are out parameters: A value is only supposed to come back from a function through such a parameter; therefore, it doesn't need to be initialized.
I came up with this way of looking at ref/out parameters by studying Microsoft's COM technology. IDL (interface description language) is used to describe COM component interfaces, and with IDL, parameters are augmented with in, out, and inout declarators. I suspect .NET and C# have partly inherited these declarators from COM, albeit with slightly different names (ref instead of inout).
With COM, out parameters are frequently used to retrieve an interface method's actual return value, since the "real" return value is often already used for returning a HRESULT success/error code.
With .NET, I think out parameters have far less importance, even in cases where you want to return several values from a method (you could return complex objects or Tuples in these situations).
One important difference is this:
A variable passed as an out argument
need not be initialized. However, the
out parameter must be assigned a value
before the method returns.
(A ref parameter does not require this)
Source: http://msdn.microsoft.com/en-us/library/t3c3bfhx(VS.71).aspx
An out parameter is useful when you want multiple result values from a method. Technically, you could use a ref parameter to achieve the same goal but an out parameter does a significantly better job at conveying intent. When you use ref, It is not clear why you are doing so instead of using out or instead of using the function result. Presumably, you intend on changing the value passed, but why you are changing it isn't clear simply from the function signature.
I think a fine example is int.TryParse()
http://msdn.microsoft.com/en-us/library/f02979c7.aspx
The primary reason that out is better than ref is that you don't need to assign a dummy value to the return var before calling (even implicitly).
So out tells you, and the compiler: "This var will be assigned within the method. And the var's initial value, if any, will not even be looked at."
Major difference between the two is that if we are using ref then we have to initialize this before call and it is optional that we assign a value to our ref variable in our method.
However for out methods we do not have to explicitly initialize them but in our method we have to assign some value to it, otherwise they will generate compile time error.
I just ran across this error message while working in C#
A property or indexer may not be passed as an out or ref parameter
I known what caused this and did the quick solution of creating a local variable of the correct type, calling the function with it as the out/ref parameter and then assigning it back to the property:
RefFn(ref obj.prop);
turns into
{
var t = obj.prop;
RefFn(ref t);
obj.prop = t;
}
Clearly this would fail if the property doesn't support get and set in the current context.
Why doesn't C# just do that for me?
The only cases where I can think of where this might cause problems are:
threading
exceptions
For threading that transformation affects when the writes happen (after the function call vs. in the function call), but I rather suspect any code that counts on that would get little sympathy when it breaks.
For exceptions, the concern would be; what happens if the function assigns to one of several ref parameters than throws? Any trivial solution would result in all or none of the parameters being assigned to when some should be and some should not be. Again I don't think this would be supported use of the language.
Note: I understand the mechanics of why this error messages is generated. What I'm looking for is the rationale for why C# doesn't automatically implement the trivial workaround.
Because you're passing the result of the indexer, which is really the result of a method call. There's no guarantee that the indexer property also has a setter, and passing it by ref would lead to a false security on the developer's part when he thinks that his property is going to be set without the setter being called.
On a more technical level, ref and out pass the memory address of the object passed into them, and to set a property, you have to call the setter, so there's no guarantee that the property would actually be changed especially when the property type is immutable. ref and out don't just set the value upon return of the method, they pass the actual memory reference to the object itself.
Properties are nothing more than syntactic sugar over the Java style getX/setX methods. It doesn't make much sense for 'ref' on a method. In your instance it would make sense because your properties are merely stubbing out fields. Properties don't have to just be stubs, hence the framework cannot allow 'ref' on Properties.
EDIT: Well, the simple answer is that the mere fact that a Property getter or setter could include far more than just a field read/write makes it undesirable, not to mention possibly unexpected, to allow the sort of sugar you are proposing. This isn't to say I haven't been in need of this functionality before, just that I understand why they wouldn't want to provide it.
Just for info, C# 4.0 will have something like this sugar, but only when calling interop methods - partly due to the sheer propensity of ref in this scenario. I haven't tested it much (in the CTP); we'll have to see how it pans out...
You can use fields with ref/out, but not properties. The reason is that properties are really just a syntax short cut for special methods. The compiler actually translates get / set properties to corresponding get_X and set_X methods as the CLR has no immediate support for properties.
It wouldn't be thread-safe; if two threads simultaneously create their own copies of the property value and pass them to functions as ref parameters, only one of them ends up back in the property.
class Program
{
static int PropertyX { get; set; }
static void Main()
{
PropertyX = 0;
// Sugared from:
// WaitCallback w = (o) => WaitAndIncrement(500, ref PropertyX);
WaitCallback w = (o) => {
int x1 = PropertyX;
WaitAndIncrement(500, ref x1);
PropertyX = x1;
};
// end sugar
ThreadPool.QueueUserWorkItem(w);
// Sugared from:
// WaitAndIncrement(1000, ref PropertyX);
int x2 = PropertyX;
WaitAndIncrement(1000, ref x2);
PropertyX = x2;
// end sugar
Console.WriteLine(PropertyX);
}
static void WaitAndIncrement(int wait, ref int i)
{
Thread.Sleep(wait);
i++;
}
}
PropertyX ends up as 1, whereas a field or local variable would be 2.
That code sample also highlights the difficulties introduced by things like anonymous methods when asking the compiler to do sugary stuff.
The reason for this is that C# does not support "parameterful" properties that accept parameters passed by reference. It is interesting to note that the CLR does support this functionalty but C# does not.
When you pass ref/out prepended it means that you are passing a reference type which is stored in the heap.
Properties are wrapper methods, not variables.
If you're asking why the compiler doesn't substitute the field returned by the property's getter, it's because the getter can return a const or readonly or literal or something else that shouldn't be re-initialized or overwritten.
This site appears to have a work around for you. I have not tested it though, so I can't guarantee it will work. The example appears to use reflection in order to gain access to the get and set functions of the property. This is probably not a recommended approach, but it might accomplish what you're asking for.
http://www.codeproject.com/KB/cs/Passing_Properties_byref.aspx
I'm faced with a situation that I think can only be solved by using a ref parameter. However, this will mean changing a method to always accept a ref parameter when I only need the functionality provided by a ref parameter 5% of the time.
This makes me think "whoa, crazy, must find another way". Am I being stupid? What sort of problems can be caused by a ref parameter?
Edit
Further details were requested, I don't think they are entirely relevant to what I was asking but here we go.
I'm wanting to either save a new instance (which will update with the ID which may later be used) or retrieve an existing instance that matches some logic and update that, save it then change the reference of the new instance to point to the existing one.
Code may make it clearer:
protected override void BeforeSave(Log entity)
{
var newLog = entity;
var existingLog = (from log in repository.All()
where log.Stuff == newLog.Stuff
&& log.Id != newLog.Id
select log).SingleOrDefault();
if (existingLog != null)
{
// update the time
existingLog.SomeValue = entity.SomeValue;
// remove the reference to the new entity
entity = existingLog;
}
}
// called from base class which usually does nothing before save
public void Save(TEntity entity)
{
var report = validator.Validate(entity);
if (report.ValidationPassed)
{
BeforeSave(entity);
repository.Save(entity);
}
else
{
throw new ValidationException { Report = report };
}
}
It's the fact that I would be adding it in only for one child (so far) of the base class that prevents me using an overload (due to the fact I would have to duplicate the Save method). I also have the problem whereby I need to force them to use the ref version in this instance otherwise things won't work as expected.
Can you add an overload? Have one signature without the ref parameter, and one with it.
Ref parameters can be useful, and I'm glad they exist in C#, but they shouldn't be used without thought. Often if a method is effectively returning two values, it would be better either to split the method into two parts, or encapsulate both values in a single type. Neither of these covers every case though - there are definitely times when ref is the best option.
Perhaps use an overloaded function for this 5% case and leave the other function as is.
Unnecessary ref parameters can lead to bad design patterns, but if you have a specific need, there's no problem with doing this.
If you take the .NET Framework as a barometer of people's expectations of an API, consider that almost all of the String methods return the modified value, but leave the passed argument unchanged. String.Trim(), for instance, returns the trimmed String - it doesn't trim the String that was passed in as an argument.
Now, obviously, this is only feasible if you're willing to put return-values into your API. Also, if your function already returns a value, you run into the nasty possibility of creating a custom structure that contains your original return value as well as the newly changed object.
Ultimately, it's up to you and how you document your API. I've found in my experience though that my fellow programmers tend to expect my functions to act "like the .NET Framework functions". :)
A ref parameter won't cause problems per se. It's a documented feature of the language.
However it could cause social problems. Specifically, clients of your API might not expect a ref parameter simply because it's rare. You might modify a reference that the client doesn't expect.
Of course you can argue that this is the client's fault for not reading your API spec, and that would be true. But sometimes it's best to reduce surprise. Writing good code isn't just about following the rules and documenting stuff, it's also about making something naturally obvious to a human user.
An overload won't kill your application or its design. As long as the intent is clearly documented, it should be okay.
One thing that might be considered is mitigating your fears about the ref parameter through a different type of parameter. For example, consider this:
public class SaveArgs
{
public SaveArgs(TEntity value) { this.Value = value; }
public TEntity Value { get; private set;}
public int NewId { get; internal set; }
public bool NewIdGenerated { get; internal set; }
}
In your code, you simply pass a SaveArgs rather than the TEntity, so that you can modify its properties with more meaningful information. (Naturally, it'd be a better-designed class than what I have above.) But then you wouldn't have to worry about vague method interfaces, and you could return as much data as you needed to in a verbose class.
Just a thought.
EDIT: Fixed the code. My bad.
There is nothing wrong with using ref parameters that I can think of, in fact they can be very handy sometimes. I think they sometimes get a bad rap due to debugging since the value of the variable can change in the code logic and can sometimes be hard to track. It also makes things difficult when converting to things like WebServices where a "value" only pass will suffice.
The biggest issue I have with ref parameters is they make it difficult to use type inference as you must sometimes explicitly declare the type of the variable being used as the ref parameter.
Most of the time I use a ref parameter it's in a TryGet scenario. In general I've stopped using ref's in that scenario and instead opted for using a more functional style method by way of an option.
For instance. TryGetValue in dictionary switches from
bool TryGetValue(TKey key, out TValue value)
To
Option<Value> TryGetValue(TKey key)
Option available here: http://blogs.msdn.com/jaredpar/archive/2008/10/08/functional-c-providing-an-option-part-2.aspx
The most common use I've seen for ref parameters is as a way of returning multiple values. If that's the case, you should consider creating a class or struct that returns all the values as one object.
If you still want to use a ref but want to make it optional, add a function overload.
ref is just a tool. You should think: What is the best design pattern for what I am building?
Sometimes will be better to use an overloaded method.
Others will be better to return a custom type or a tuple.
Others will be better to use a global variable.
And others ref will be the right decision.
If your method only needs this ref parameter 5% of the time perhaps you need to break this method down. Of course without more details its hard to say but this to me smells like a case of violating single responsability principal. Perhaps overloading it will help.
As for your question there is no issue in my opinion passing a parameter as a reference although it is not a common thing to run into.
This is one of those things that F# or other functional programming languages solve a lot better with returning Tuple values. Its a lot more cleaner and terse syntax. In the book I am reading on F#, it actually points out the C# equivelant of using ref as doing the same thing in C# to return multiple parameters.
I have no idea if it is a bad practice or there is some underlying "booga booga" about ref parameters, to me they just feel as not clean syntax.
A void-returning function with a single reference parameter certainly looks funny to me. If I were reviewing this code, I'd suggest refactoring the BeforeSave() to include the call to Repository.Save() - renaming it, obviously. Why not just have one method that takes your possibly-new entity and guarantees that everything is saved properly? The caller doesn't do anything with the returned entity anyway.