Supose I create a variable B that references another one, A, both of them being reference type variables. If I set B or A to null, the other one would still be pointing to the object instance, that would remain untouched.
SomeClass A = new SomeClass();
SomeClass B = A;
B = null; //A still has a valid reference
This is also true:
SomeClass A = new SomeClass();
SomeClass B = A;
A = null; //B still has a valid reference
But I donĀ“t want B to reference the instance referenced by A, I want B to reference A itself. That way, if B was set to null, A would be set to null as well. Is there any elegant, safe(no pointers) way of doing this? or am I trying to do something that is against C# principles?
Thanks.
You can't do this the way you would do it in C++ or C. The only time you can have a reference to an object handle is when you call a method with a ref parameter: viz:
void main_method()
{
SomeClass A = new SomeClass();
secondary_method(ref A);
}
void secondary_method(ref SomeClass B)
{
B = null; // this has the side effect of clearing the A of main_method
}
The solution here is for neither of those variables to directly refer to the object, but to instead refer to an object instance that has a field pointing to an actual SomeClass instance:
public class Pointer<T>
{
public T Value {get;set;}
}
Pointer<SomeClass> A = new Pointer<SomeClass>(){ Value = new SomeClass()};
Pointer<SomeClass> B = A;
B.Value = null;
//A.Value is null
MSDN has the following definition for a reference type: "Variables of reference types store references to the actual data" (http://msdn.microsoft.com/en-us/library/490f96s2%28v=vs.110%29.aspx). In your case, setting the second variable to null is only causing the reference of the second variable to be broken without having any effect on the actual data. This is clearly shown in the post by Olivier available at Setting a type reference type to null doesn't affect copied type?.
A possible solution to your problem is to make use of a WeakReference.
According to MSDN: "A weak reference allows the garbage collector to collect an object while still allowing an application to access the object. If you need the object, you can still obtain a strong reference to it and prevent it from being collected" (http://msdn.microsoft.com/en-us/library/system.weakreference%28v=vs.110%29.aspx).
So as long as the second (local) reference is accessing your weak reference, the object won't be garbage collected. Once you break the local reference by setting that to null, the GC would clear the weakly referenced object. More information on WeakReference is available at: http://msdn.microsoft.com/en-us/library/ms404247.aspx
Related
Let's say you have this: (boilerplate code left out)
static Object foo()
{
Object test = new Object();
return test;
}
Object output = foo();
At least in C#, the program doesn't crash when you try to use output even though it's pointing to the same object declared in foo. Why didn't it immediately get de-allocated?
And if this is a bad way to handle Object return values, is there a better way to deal with it?
The GC will only collect (if it so chooses to) objects that have no reachable references. In your example:
static object foo()
{
object test = new object();
return test;
}
object output = foo();
The instance referenced by test is also referenced by output when foo returns. So even though test is no longer a reachable reference, output still is, therefore, the instance its "pointing" to is not a candidate for collection.
The following example is radically different:
static object foo()
{
object test = new object();
return test;
}
foo();
Here, the instance referenced by test is a valid candidate for collection once foo returns (or even before, but lets ignore those optimizations) because once foo returns, there are no reachable references to the object.
Note that I'm using reachable references. An object might have valid references referencing it but still be a candidate for collection if the object containing the reference is also a candidate for collection. The GC is smart enough about these kind of cycles:
class A { public B b; }
class B { public A a; }
void Foo()
{
A a = new A();
B b = new B();
a.B = b;
b.A = a;
}
Foo(); //both a and b are elegible for collection once Foo exists.
Reference passing. Object is a Reference type, meaning every single change made on it (including allocation) will be kept on the block where this Object have been declared.
If you did some C/C++, References are somewhat automanaged pointers.
Return it is not bad practice, but one should be careful about it. When you return the reference and assign it later, you'll lose the previous version if you happened to allocate it already before. Again, if you did some C, it's like making your pointer point on a second allocated memory part, leaving the first one (in the previous exemple case).
I am reading following blog by Eric Lippert: The truth about Value types
In this, he mentions there are 3 kinds of values in the opening:
Instance of Value types
Instance of Reference types
References
It is incomplete. What about references? References are neither value types nor instances of reference types, but they are values..
So, in the following example:
int i = 10;
string s = "Hello"
First is instance of value type and second is instance of reference type. So, what is the third type, References and how do we obtain that?
So, what is the third type, References and how do we obtain that?
The variable s is a variable which holds the value of the reference. This value is a reference to a string (with a value of "Hello") in memory.
To make this more clear, say you have:
string s1 = "Hello";
string s2 = s1;
In this case, s1 and s2 are both variables that are each a reference to the same reference type instance (the string). There is only a single actual string instance (the reference type) involved here, but there are two references to that instance.
Fields and variables of reference type, such as your s, are references to an instance of a reference type that lives on the heap.
You never use an instance of a reference type directly; instead, you use it through a reference.
A reference is not really a 'third type'. It's actually a pointer that refers to a concrete instance of an object. Take a look at this example:
class MyClass
{
public string Str { get; set; }
}
class Program
{
static void Main(string[] args)
{
int a = 1;
int b = 2;
int c = 3;
var myObj = new MyClass
{
Str = "Whatever"
};
Console.WriteLine("{0};\t{1};\t{2};\t{3}", a, b, c, myObj.Str);
MyFunction(a, ref b, out c, myObj);
Console.WriteLine("{0};\t{1};\t{2};\t{3}", a, b, c, myObj.Str);
Console.ReadLine();
}
static void MyFunction(int justValue, ref int refInt, out int outInt, MyClass obj)
{
obj.Str = "Hello";
justValue = 101;
refInt = 102;
outInt = 103; // similar to refInt, but you MUST set the value of the parameter if it's uses 'out' keyword
}
}
The output of this program is:
1; 2; 3; Whatever
1; 102; 103; Hello
Focus on the MyFunction:
The first parameter we pass is a simple int which is a value type. By default value types are cloned when passed as the parameter (a new instance is being created). That's why the value of 'a' didn't change.
You can change this behavior by adding 'ref' or 'out' keyword to the parameter. In this case you actually pass a reference to that very instance of your int. In MyFunction the value of that instance is being overridden.
Here you can read move out ref and out
The last example is the object of MyClass. All classes are reference types and that's why you always pass them as references (no special keywords needed).
You can think about a reference as about an address in computer memory. Bytes at that address compose your object. If you pass it as value, you take that bytes out and pass them to a function. If you pass it as a reference you only pass the address. Than in your called function you can read bytes from that address or write to that address. Every change affects the calling function variables, because they point to exactly the same bytes in computer memory. It's not exactly what happens in .Net (it runs in a virtual machine), but I think this analogy will help you understand the concept.
Why do we use references? There are many reasons. One of them is that passing a big object by value would be very slow and would require cloning it. When you pass a reference to an object, than no matter how big that object is you only pass w few bytes that contain it's 'address' in memory.
Moreover your object may contain elements that cannot be cloned (like an open socket). Using reference you can easily pass such an object between functions.
It's also worth mentioning that sctructs, even though they look very similar to classes are actually value types and behave as value types (when you pass a struct to a function, you actually pass a clone - a new instance).
When I have this code:
class A
{
public int X = 0;
...
}
public void Function()
{
// here I create a new instance of class
A a = new A();
a.X = 10;
// here I create a pointer to null
A b = null;
// here I assign a to b
b = a;
b.X = 20;
}
did I pass the reference to instance of class A now? or I cloned the instance of A to new instance and created a reference to it in b?
is changing X in b also changing X in a? Why? If not, what is a proper way to create a copy of a and insert that to b?
Why the same with strings would always create a copy? Is equal operator overridden in strings?
string a = "hello";
string b = a;
b = "world";
// "hello world"
Console.WriteLine( a + " " + b );
C# uses references not pointers. Classes are reference types.
On your example, b has the same reference with a. They referencing the same location on memory.
changing X in b also changing X in a? Why?
Yes, because they reference to the same objects and changing one reference will affect the other one.
string a = "hello";
string b = a;
b = "world";
// "hello world"
Console.WriteLine( a + " " + b );
Strings are reference types also. But they are also immutable type. Which means you can't change them. Even if you think you change them, you actually create new strings object.
line you create an object contains "hello" with a reference called a.
line you create a new reference called b referencing to the same object. ("hello")
line you assign your b reference new object called "world". Your b referance is not referencing "hello" object anymore.
did I pass the pointer to instance of class A now? or I cloned the
instance of A to new instance and created a pointer to it in b?
b is holding the same reference as a, both of them pointing to the same location.
changing X in b also changing X in a? Why?
Because both of them are pointing to the same reference.
what is a proper way to create a copy of a and insert that to b?
Implement IClonable interface
Supports cloning, which creates a new instance of a class with the
same value as an existing instance
EDIT
Since you edited the question with string, although strings are reference types but they are immutable as well
string (C# Reference)
Strings are immutable--the contents of a string object cannot be
changed after the object is created, although the syntax makes it
appear as if you can do this.
Object b is pointing to the object a, you have to do the deep clone to make a copy using IClonable Interface.
When you assign, you pass a copy of the return value of the assigned expression.
For value types, this is the value you usually get to see when you use them (like the numerical value of an integer).
For reference types, the actual value is something like an address pointing to the referenced object (but, what it really is, is an implementation detail). So, even though you pass a copy of that address, that copy points to the same object.
I always thought that a method parameter with a class type is passed as a reference parameter by default. Apparently that is not always the case. Consider these unit tests in C# (using MSTest).
[TestClass]
public class Sandbox
{
private class TestRefClass
{
public int TestInt { get; set; }
}
private void TestDefaultMethod(TestRefClass testClass)
{
testClass.TestInt = 1;
}
private void TestAssignmentMethod(TestRefClass testClass)
{
testClass = new TestRefClass() { TestInt = 1 };
}
private void TestAssignmentRefMethod(ref TestRefClass testClass)
{
testClass = new TestRefClass() { TestInt = 1 };
}
[TestMethod]
public void DefaultTest()
{
var testObj = new TestRefClass() { TestInt = 0 };
TestDefaultMethod(testObj);
Assert.IsTrue(testObj.TestInt == 1);
}
[TestMethod]
public void AssignmentTest()
{
var testObj = new TestRefClass() { TestInt = 0 };
TestAssignmentMethod(testObj);
Assert.IsTrue(testObj.TestInt == 1);
}
[TestMethod]
public void AssignmentRefTest()
{
var testObj = new TestRefClass() { TestInt = 0 };
TestAssignmentRefMethod(ref testObj);
Assert.IsTrue(testObj.TestInt == 1);
}
}
The results are that AssignmentTest() fails and the other two test methods pass. I assume the issue is that assigning a new instance to the testClass parameter breaks the parameter reference, but somehow explicitly adding the ref keyword fixes this.
Can anyone give a good, detailed explanation of whats going on here? I'm mainly just trying to expand my knowledge of C#; I don't have any specific scenario I'm trying to solve...
The thing that is nearly always forgotten is that a class isn't passed by reference, the reference to the class is passed by value.
This is important. Instead of copying the entire class (pass by value in the stereotypical sense), the reference to that class (I'm trying to avoid saying "pointer") is copied. This is 4 or 8 bytes; much more palatable than copying the whole class and in effect means the class is passed "by reference".
At this point, the method has it's own copy of the reference to the class. Assignment to that reference is scoped within the method (the method re-assigned only its own copy of the reference).
Dereferencing that reference (as in, talking to class members) would work as you'd expect: you'd see the underlying class unless you change it to look at a new instance (which is what you do in your failing test).
Using the ref keyword is effectively passing the reference itself by reference (pointer to a pointer sort of thing).
As always, Jon Skeet has provided a very well written overview:
http://www.yoda.arachsys.com/csharp/parameters.html
Pay attention to the "Reference parameters" part:
Reference parameters don't pass the values of the variables used in
the function member invocation - they use the variables themselves.
If the method assigns something to a ref reference, then the caller's copy is also affected (as you have observed) because they are looking at the same reference to an instance in memory (as opposed to each having their own copy).
The default convention for parameters in C# is pass by value. This is true whether the parameter is a class or struct. In the class case just the reference is passed by value while in the struct case a shallow copy of the entire object is passed.
When you enter the TestAssignmentMethod there are 2 references to a single object: testObj which lives in AssignmentTest and testClass which lives in TestAssignmentMethod. If you were to mutate the actual object via testClass or testObj it would be visible to both references since they both point to the same object. In the first line though you execute
testClass = new TestRefClass() { TestInt = 1 }
This creates a new object and points testClass to it. This doesn't alter where the testObj reference points in any way because testClass is an independent copy. There are now 2 objects and 2 references which each reference pointing to a different object instance.
If you want pass by reference semantics you need to use a ref parameter.
My 2 cents
When a class is passed to a method, a copy of its memory space address is being sent (a direction to your house is being sent). So any operation on that address will affect the house but will not change the address itself. (This is default).
Passing a class (object) by reference has an effect of passing its actual address instead of a copy of an address. That means if you assign a new object to an argument passed by reference it will change the actual address (similar to relocation). :D
This is how I see it.
The AssignmentTest uses TestAssignmentMethod which only changes the object reference passed by value.
So the object itself is passed by reference but the reference to the object is passed by value. so when you do:
testClass = new TestRefClass() { TestInt = 1 };
You are changing the local copied reference passed to the method not the reference you have in the test.
So here:
[TestMethod]
public void AssignmentTest()
{
var testObj = new TestRefClass() { TestInt = 0 };
TestAssignmentMethod(testObj);
Assert.IsTrue(testObj.TestInt == 1);
}
testObj is a reference variable. When you pass it to TestAssignmentMethod(testObj);, the refernce is passed by value. so when you change it in the method, original reference still points to the same object.
There are lot's of subtleties missed in the posted answers here that will create unexpected results and confuse new C# developers. There are actually two ways to process a reference passed by value in C# methods.
All methods in C# pass arguments in BY VALUE by default unless you use the ref, in, or out keywords. Passing a REFERENCE BY VALUE means a COPY of the MEMORY ADDRESS of the object used by the outside reference is passed in and assigned to the method parameter. The original outside variable address is not passed in nor the original object in memory, just the memory address to the object.
Both variables now point to the same object in memory.
This copy of the address to the object in memory is the VALUE for pass by value for all reference types. That means the original reference variable that points to the object address remains the same, and a new copy of that memory address is assigned to a new variable in the method parameter. They BOTH point to the same object. That means if either change properties on the object, it will affect the original object and will be seen by both variables.
This seems to act like a PASS BY REFERENCE, but it is not. That is what confuses many developers.
But this means some "weird" and unexpected things can happen passing a reference by value in methods if you are not careful. It means your method variable can connect to the same object and change the properties and fields of the original shared object ...BUT... as soon as you reassign the method variable to a new instance of the same type of object, it loses a connection to the original instance and no longer affects the original object used by the outside reference.
You might assume the method has assigned a fresh object to the outside reference variable, but you have not! Changing that new object's properties in the method no longer affect the outside reference. So BE CAREFUL!
Let's test this weirdness in C#:
// First, create my cat class. I can change its name
// to anything I want. But instead, I want it to have
// a special name assigned by the next class via a method.
class MyCat
{
public string Name { get; set; }
}
// This special class will assign a popular name to me cat.
class CatNames
{
public enum PopularNames {
Felix,
Fluffy
}
public void ChangeName(MyCat c)
{
PopularNames p = PopularNames.Felix;
c.Name = p.ToString();
}
public void ChangeNameAndCat(MyCat c)
{
PopularNames p = PopularNames.Fluffy;
MyCat d = new MyCat();
d.Name = p.ToString();
c = d;
// Note: In this case, you might want to return the new "MyCat"
// object and its name to the caller.
}
}
// Testing passing by value and how references are passed...
CatNames catnamechanger = new CatNames();
// I created two cats with the same name so you can see
// what names actually changed below.
MyCat cat1 = new MyCat();
cat1.Name = "Bubba";
MyCat cat2 = new MyCat();
cat2.Name = "Bubba";
catnamechanger.ChangeName(cat1);
catnamechanger.ChangeNameAndCat(cat2);
Console.WriteLine("My Cat1's Name is: " + cat1.Name);
Console.WriteLine("My Cat2's Name is: " + cat2.Name);
// ============== OUTPUT ==================
// My Cat1's Name is: Felix
// My Cat2's Name is: Bubba <<< OOPS! My cat name kept the original
RESULTS
Notice the first cat had its name changed on the original object, but the second cat kept its original name, "Bubba", as a new cat was assigned to the method variable. It lost connection to the original object. The reason is, passing a reference by value still allows you to affect properties of the passed in address to the original object. But as soon as you change where the method variable points, that reference is lost.
I understand (or at least I believe I do) what it means to pass an instance of a class to a method by ref versus not passing by ref. When or under what circumstances should one pass a class instance by ref? Is there a best practice when it comes to using the ref keyword for class instances?
The clearest explanation I've ever run across for output and ref parameters is ... Jon Skeet's.
Parameter Passing in C#
He doesn't go into "best practices", but if you understand the examples he's given, you'll know when you need to use them.
When you may replace the original object, you should send him as ref. If it's just for output and can be uninitialized before calling the function, you'll use out.
Put succinctly, you would pass a value as a ref parameter if you want the function you're calling to be able to alter the value of that variable.
This is not the same as passing a reference type as a parameter to a function. In those cases, you're still passing by value, but the value is a reference. In the case of passing by ref, then an actual reference to the variable is sent; essentially, you and the function you're calling "share" the same variable.
Consider the following:
public void Foo(ref int bar)
{
bar = 5;
}
...
int baz = 2;
Foo(ref baz);
In this case, the baz variable has a value of 5, since it was passed by reference. The semantics are entirely clear for value types, but not as clear for reference types.
public class MyClass
{
public int PropName { get; set; }
}
public void Foo(MyClass bar)
{
bar.PropName = 5;
}
...
MyClass baz = new MyClass();
baz.PropName = 2;
Foo(baz);
As expected, baz.PropName will be 5, since MyClass is a reference type. But let's do this:
public void Foo(MyClass bar)
{
bar = new MyClass();
bar.PropName = 5;
}
With the same calling code, baz.PropName will remain 2. This is because even though MyClass is a reference type, Foo has its own variable for bar; bar and baz just start out with the same value, but once Foo assigns a new value, they are just two different variables. If, however, we do this:
public void Foo(ref MyClass bar)
{
bar = new MyClass();
bar.PropName = 5;
}
...
MyClass baz = new MyClass();
baz.PropName = 2;
Foo(ref baz);
We'll end up with PropName being 5, since we passed baz by reference, making the two functions "share" the same variable.
The ref keyword allows you to pass an argument by reference. For reference types this means that the actual reference to an object is passed (rather than a copy of that reference). For value types this means that a reference to the variable holding the value of that type is passed.
This is used for methods that need to return more than one result but don't return a complex type to encapsulate those results. It allows you to pass a reference to a object into the method so that the method can modify that object.
The important thing to remember is that reference types are not normally passed by reference, a copy of a reference is passed. This means that you are not working with the actual reference that was passed to you. When you use ref on a class instance you are passing the actual reference itself so all modifications to it (like setting it to null for example) will be applied to the original reference.
When passing reference types (non-value-types) to a method, only the reference is passed in both cases. But when you use the ref keyword, the method being called can change the reference.
For example:
public void MyMethod(ref MyClass obj)
{
obj = new MyClass();
}
elsewhere:
MyClass x = y; // y is an instance of MyClass
// x points to y
MyMethod(ref x);
// x points to a new instance of MyClass
when calling MyMethod(ref x), x will point to the newly created object after the method call. x no longer points to the original object.
Most use cases for passing a reference variable by reference involve initialization and out is more appropriate than ref. And they compile to the same thing (the compiler enforces different constraints - that ref variables be initialized before being passed in and that out variables are initialized in the method). So the only case I can think of where this would be useful is where you need to do some checking of an instantiated ref variable and may need to reinitialize under certain circumstances.
This might also be necessary to modify an immutable class (like string) as pointed out by Asaf R.
I found that it is easy to run into trouble using the ref keyword.
The following method will modify f even without the ref keyword in the method signature because f is a reference type:
public void TrySet(Foo f,string s)
{
f.Bar = s;
}
In this second case however, the original Foo is affected only by the first line of code, the rest of the method somehow creates and affects only a new local variable.
public void TryNew(Foo f, string s)
{
f.Bar = ""; //original f is modified
f = new Foo(); //new f is created
f.Bar = s; //new f is modified, no effect on original f
}
It would be good if the compiler gave you a warning in that case. Basically what you are doing is replacing the reference you received with another one referencing a different memory area.
It you actually want to replace the object with a new instance, use the ref keyword:
public void TryNew(ref Foo f, string s)...
But are you not shooting yourself in the foot? If the caller is not aware that a new object is created, the following code will probably not work as intended:
Foo f = SomeClass.AFoo;
TryNew(ref f, "some string"); //this will clear SomeClass.AFoo.Bar and then create a new distinct object
And if you try to "fix" the problem by adding the line:
SomeClass.AFoo = f;
If the code holds a references to SomeClass.AFoo somewhere else, that reference will become invalid...
As a general rule, you probably should avoid using the new keyword to alter an object which you read from another class or received as a parameter in a method.
Regarding the use of the ref keyword with reference types, I can suggest this approach:
1) Don't use it if simply setting the values of the reference type but be explicit in your function or parameter names and in the comments:
public void SetFoo(Foo fooToSet, string s)
{
fooToSet.Bar = s;
}
2) When there is a legitimate reason to replace the input parameter with a new, different instance, use a function with a return value instead:
public Foo TryNew(string s)
{
Foo f = new Foo();
f.Bar = s;
return f;
}
But using this function may still have unwanted consequences with the SomeClass.AFoo scenario:
SomeClass.AFoo = TryNew("some string");//stores a different object in SomeClass.AFoo
3) In some cases such as the string swapping example here it is handy to use ref params, but just as in case 2 make sure that swapping the object addresses does not affect the rest of your code.
Because it manages memory allocation for you, C# makes it all too easy to forget everything about memory management but it really helps to understand how pointers and references work. Otherwise you may introduce subtle bugs that are difficult to find.
Finally, this is typically the case where one would want to use a memcpy like function but there is no such thing in C# that I know of.