I'm new to Extension Methods and exploring what they can do.
Is it possible for the calling object to be assigned the output without a specific assignment?
Here is a simple example to explain:
public static string ExtensionTest(this string input)
{
return input + " Extended!";
}
In the following examples ...
var foo = "Hello World!";
var foo2 = foo.ExtensionTest(); // foo2 = "Hello World! Extended!"
foo.ExtensionTest(); // foo = "Hello World!"
foo = foo.ExtensionTest(); // foo = "Hello World! Extended!"
... is there any way to get foo.ExtensionTest() to result in "Hello World! Extended!" without specifically assigning foo = foo.ExtensionTest()
No, but the reason that will not work has to do with the immutability of strings, and nothing to do with extension methods.
If instead you had a class:
public class SomeClass
{
public int Value {get; set;}
}
And an extension method:
public static void DoIt(this SomeClass someClass)
{
someClass.Value++;
}
Would have the effect of:
var someClass = new SomeClass{ Value = 1 };
someClass.DoIt();
Console.WriteLine(someClass.Value); //prints "2"
The closest you could get to this (which would be weird) would be to accept an out parameter and use that as the return value:
public static void ExtensionTest(this string input, out string output)
{
output = input + " Extended!";
}
Example:
string foo = "Hello World!";
foo.ExtensionTest(out foo);
The funny thing about that is, while it more closely resembles what you're asking about, it's actually slightly more to type.
To be clear: I don't recommend this, unless it's really important to you to make this sort of method call. The probability of another developer uttering "WTF?" upon seeing it has got to be something like 100%.
What you are seeing is due to strings being immutable.
In any case, you will have to do some sort of assignment if you want the object to change.
The 'this' parameter is passed by value, not by reference. So no, you can't modify the variable in the calling program that is aliased by 'this' in your extension method.
No. Strings in .NET are immutable. All public String methods return new instance of String too.
To assign the new value to your variable inside the extension method, you'd need a ref modifyer on the parameter, which the C# compiler does not permit on extension methods (and it would be a bad idea anyway). It's better to make it clear you're changing the variable.
Use the faster StringBuilder for mutable strings and as pointed out the ref or out keyword. StringBuilder is basically an improved linked-list for strings.
Immutable strings were a design decision to allow close behavior to the C language and many other languages.
string str = "foo";
str += "bar"; // str will be free for garbage collection,
//creating a new string object.
//Note: not entirely true in later C# versions.
StringBuilder sb = new StringBuild();
sb.Append("foo");
sb.Append("bar"); // appends object references to a linked list.
See also:
string is immutable and stringbuilder is mutable
http://en.wikipedia.org/wiki/Linked_list
Related
I know that "string" in C# is a reference type. This is on MSDN. However, this code doesn't work as it should then:
class Test
{
public static void Main()
{
string test = "before passing";
Console.WriteLine(test);
TestI(test);
Console.WriteLine(test);
}
public static void TestI(string test)
{
test = "after passing";
}
}
The output should be "before passing" "after passing" since I'm passing the string as a parameter and it being a reference type, the second output statement should recognize that the text changed in the TestI method. However, I get "before passing" "before passing" making it seem that it is passed by value not by ref. I understand that strings are immutable, but I don't see how that would explain what is going on here. What am I missing? Thanks.
The reference to the string is passed by value. There's a big difference between passing a reference by value and passing an object by reference. It's unfortunate that the word "reference" is used in both cases.
If you do pass the string reference by reference, it will work as you expect:
using System;
class Test
{
public static void Main()
{
string test = "before passing";
Console.WriteLine(test);
TestI(ref test);
Console.WriteLine(test);
}
public static void TestI(ref string test)
{
test = "after passing";
}
}
Now you need to distinguish between making changes to the object which a reference refers to, and making a change to a variable (such as a parameter) to let it refer to a different object. We can't make changes to a string because strings are immutable, but we can demonstrate it with a StringBuilder instead:
using System;
using System.Text;
class Test
{
public static void Main()
{
StringBuilder test = new StringBuilder();
Console.WriteLine(test);
TestI(test);
Console.WriteLine(test);
}
public static void TestI(StringBuilder test)
{
// Note that we're not changing the value
// of the "test" parameter - we're changing
// the data in the object it's referring to
test.Append("changing");
}
}
See my article on parameter passing for more details.
If we have to answer the question: String is a reference type and it behaves as a reference. We pass a parameter that holds a reference to, not the actual string. The problem is in the function:
public static void TestI(string test)
{
test = "after passing";
}
The parameter test holds a reference to the string but it is a copy. We have two variables pointing to the string. And because any operations with strings actually create a new object, we make our local copy to point to the new string. But the original test variable is not changed.
The suggested solutions to put ref in the function declaration and in the invocation work because we will not pass the value of the test variable but will pass just a reference to it. Thus any changes inside the function will reflect the original variable.
I want to repeat at the end: String is a reference type but since its immutable the line test = "after passing"; actually creates a new object and our copy of the variable test is changed to point to the new string.
As others have stated, the String type in .NET is immutable and it's reference is passed by value.
In the original code, as soon as this line executes:
test = "after passing";
then test is no longer referring to the original object. We've created a new String object and assigned test to reference that object on the managed heap.
I feel that many people get tripped up here since there's no visible formal constructor to remind them. In this case, it's happening behind the scenes since the String type has language support in how it is constructed.
Hence, this is why the change to test is not visible outside the scope of the TestI(string) method - we've passed the reference by value and now that value has changed! But if the String reference were passed by reference, then when the reference changed we will see it outside the scope of the TestI(string) method.
Either the ref or out keyword are needed in this case. I feel the out keyword might be slightly better suited for this particular situation.
class Program
{
static void Main(string[] args)
{
string test = "before passing";
Console.WriteLine(test);
TestI(out test);
Console.WriteLine(test);
Console.ReadLine();
}
public static void TestI(out string test)
{
test = "after passing";
}
}
"A picture is worth a thousand words".
I have a simple example here, it's similar to your case.
string s1 = "abc";
string s2 = s1;
s1 = "def";
Console.WriteLine(s2);
// Output: abc
This is what happened:
Line 1 and 2: s1 and s2 variables reference to the same "abc" string object.
Line 3: Because strings are immutable, so the "abc" string object does not modify itself (to "def"), but a new "def" string object is created instead, and then s1 references to it.
Line 4: s2 still references to "abc" string object, so that's the output.
Actually it would have been the same for any object for that matter i.e. being a reference type and passing by reference are 2 different things in c#.
This would work, but that applies regardless of the type:
public static void TestI(ref string test)
Also about string being a reference type, its also a special one. Its designed to be immutable, so all of its methods won't modify the instance (they return a new one). It also has some extra things in it for performance.
Here's a good way to think about the difference between value-types, passing-by-value, reference-types, and passing-by-reference:
A variable is a container.
A value-type variable contains an instance.
A reference-type variable contains a pointer to an instance stored elsewhere.
Modifying a value-type variable mutates the instance that it contains.
Modifying a reference-type variable mutates the instance that it points to.
Separate reference-type variables can point to the same instance.
Therefore, the same instance can be mutated via any variable that points to it.
A passed-by-value argument is a new container with a new copy of the content.
A passed-by-reference argument is the original container with its original content.
When a value-type argument is passed-by-value:
Reassigning the argument's content has no effect outside scope, because the container is unique.
Modifying the argument has no effect outside scope, because the instance is an independent copy.
When a reference-type argument is passed-by-value:
Reassigning the argument's content has no effect outside scope, because the container is unique.
Modifying the argument's content affects the external scope, because the copied pointer points to a shared instance.
When any argument is passed-by-reference:
Reassigning the argument's content affects the external scope, because the container is shared.
Modifying the argument's content affects the external scope, because the content is shared.
In conclusion:
A string variable is a reference-type variable. Therefore, it contains a pointer to an instance stored elsewhere.
When passed-by-value, its pointer is copied, so modifying a string argument should affect the shared instance.
However, a string instance has no mutable properties, so a string argument cannot be modified anyway.
When passed-by-reference, the pointer's container is shared, so reassignment will still affect the external scope.
Above answers are helpful, I'd just like to add an example that I think is demonstrating clearly what happens when we pass parameter without the ref keyword, even when that parameter is a reference type:
MyClass c = new MyClass(); c.MyProperty = "foo";
CNull(c); // only a copy of the reference is sent
Console.WriteLine(c.MyProperty); // still foo, we only made the copy null
CPropertyChange(c);
Console.WriteLine(c.MyProperty); // bar
private void CNull(MyClass c2)
{
c2 = null;
}
private void CPropertyChange(MyClass c2)
{
c2.MyProperty = "bar"; // c2 is a copy, but it refers to the same object that c does (on heap) and modified property would appear on c.MyProperty as well.
}
For curious minds and to complete the conversation:
Yes, String is a reference type:
unsafe
{
string a = "Test";
string b = a;
fixed (char* p = a)
{
p[0] = 'B';
}
Console.WriteLine(a); // output: "Best"
Console.WriteLine(b); // output: "Best"
}
But note that this change only works in an unsafe block! because Strings are immutable (From MSDN):
The contents of a string object cannot be changed after the object is
created, although the syntax makes it appear as if you can do this.
For example, when you write this code, the compiler actually creates a
new string object to hold the new sequence of characters, and that new
object is assigned to b. The string "h" is then eligible for garbage
collection.
string b = "h";
b += "ello";
And keep in mind that:
Although the string is a reference type, the equality operators (== and
!=) are defined to compare the values of string objects, not
references.
Try:
public static void TestI(ref string test)
{
test = "after passing";
}
I believe your code is analogous to the following, and you should not have expected the value to have changed for the same reason it wouldn't here:
public static void Main()
{
StringWrapper testVariable = new StringWrapper("before passing");
Console.WriteLine(testVariable);
TestI(testVariable);
Console.WriteLine(testVariable);
}
public static void TestI(StringWrapper testParameter)
{
testParameter = new StringWrapper("after passing");
// this will change the object that testParameter is pointing/referring
// to but it doesn't change testVariable unless you use a reference
// parameter as indicated in other answers
}
Another way to bypass the string behavior. Use string array of ONE element only and manipulate this element.
class Test
{
public static void Main()
{
string[] test = new string[1] {"before passing"};
Console.WriteLine(ref test);
TestI(test);
Console.WriteLine(ref test);
}
public static void TestI(ref string[] test)
{
test[0] = "after passing";
}
}
I know that "string" in C# is a reference type. This is on MSDN. However, this code doesn't work as it should then:
class Test
{
public static void Main()
{
string test = "before passing";
Console.WriteLine(test);
TestI(test);
Console.WriteLine(test);
}
public static void TestI(string test)
{
test = "after passing";
}
}
The output should be "before passing" "after passing" since I'm passing the string as a parameter and it being a reference type, the second output statement should recognize that the text changed in the TestI method. However, I get "before passing" "before passing" making it seem that it is passed by value not by ref. I understand that strings are immutable, but I don't see how that would explain what is going on here. What am I missing? Thanks.
The reference to the string is passed by value. There's a big difference between passing a reference by value and passing an object by reference. It's unfortunate that the word "reference" is used in both cases.
If you do pass the string reference by reference, it will work as you expect:
using System;
class Test
{
public static void Main()
{
string test = "before passing";
Console.WriteLine(test);
TestI(ref test);
Console.WriteLine(test);
}
public static void TestI(ref string test)
{
test = "after passing";
}
}
Now you need to distinguish between making changes to the object which a reference refers to, and making a change to a variable (such as a parameter) to let it refer to a different object. We can't make changes to a string because strings are immutable, but we can demonstrate it with a StringBuilder instead:
using System;
using System.Text;
class Test
{
public static void Main()
{
StringBuilder test = new StringBuilder();
Console.WriteLine(test);
TestI(test);
Console.WriteLine(test);
}
public static void TestI(StringBuilder test)
{
// Note that we're not changing the value
// of the "test" parameter - we're changing
// the data in the object it's referring to
test.Append("changing");
}
}
See my article on parameter passing for more details.
If we have to answer the question: String is a reference type and it behaves as a reference. We pass a parameter that holds a reference to, not the actual string. The problem is in the function:
public static void TestI(string test)
{
test = "after passing";
}
The parameter test holds a reference to the string but it is a copy. We have two variables pointing to the string. And because any operations with strings actually create a new object, we make our local copy to point to the new string. But the original test variable is not changed.
The suggested solutions to put ref in the function declaration and in the invocation work because we will not pass the value of the test variable but will pass just a reference to it. Thus any changes inside the function will reflect the original variable.
I want to repeat at the end: String is a reference type but since its immutable the line test = "after passing"; actually creates a new object and our copy of the variable test is changed to point to the new string.
As others have stated, the String type in .NET is immutable and it's reference is passed by value.
In the original code, as soon as this line executes:
test = "after passing";
then test is no longer referring to the original object. We've created a new String object and assigned test to reference that object on the managed heap.
I feel that many people get tripped up here since there's no visible formal constructor to remind them. In this case, it's happening behind the scenes since the String type has language support in how it is constructed.
Hence, this is why the change to test is not visible outside the scope of the TestI(string) method - we've passed the reference by value and now that value has changed! But if the String reference were passed by reference, then when the reference changed we will see it outside the scope of the TestI(string) method.
Either the ref or out keyword are needed in this case. I feel the out keyword might be slightly better suited for this particular situation.
class Program
{
static void Main(string[] args)
{
string test = "before passing";
Console.WriteLine(test);
TestI(out test);
Console.WriteLine(test);
Console.ReadLine();
}
public static void TestI(out string test)
{
test = "after passing";
}
}
"A picture is worth a thousand words".
I have a simple example here, it's similar to your case.
string s1 = "abc";
string s2 = s1;
s1 = "def";
Console.WriteLine(s2);
// Output: abc
This is what happened:
Line 1 and 2: s1 and s2 variables reference to the same "abc" string object.
Line 3: Because strings are immutable, so the "abc" string object does not modify itself (to "def"), but a new "def" string object is created instead, and then s1 references to it.
Line 4: s2 still references to "abc" string object, so that's the output.
Actually it would have been the same for any object for that matter i.e. being a reference type and passing by reference are 2 different things in c#.
This would work, but that applies regardless of the type:
public static void TestI(ref string test)
Also about string being a reference type, its also a special one. Its designed to be immutable, so all of its methods won't modify the instance (they return a new one). It also has some extra things in it for performance.
Here's a good way to think about the difference between value-types, passing-by-value, reference-types, and passing-by-reference:
A variable is a container.
A value-type variable contains an instance.
A reference-type variable contains a pointer to an instance stored elsewhere.
Modifying a value-type variable mutates the instance that it contains.
Modifying a reference-type variable mutates the instance that it points to.
Separate reference-type variables can point to the same instance.
Therefore, the same instance can be mutated via any variable that points to it.
A passed-by-value argument is a new container with a new copy of the content.
A passed-by-reference argument is the original container with its original content.
When a value-type argument is passed-by-value:
Reassigning the argument's content has no effect outside scope, because the container is unique.
Modifying the argument has no effect outside scope, because the instance is an independent copy.
When a reference-type argument is passed-by-value:
Reassigning the argument's content has no effect outside scope, because the container is unique.
Modifying the argument's content affects the external scope, because the copied pointer points to a shared instance.
When any argument is passed-by-reference:
Reassigning the argument's content affects the external scope, because the container is shared.
Modifying the argument's content affects the external scope, because the content is shared.
In conclusion:
A string variable is a reference-type variable. Therefore, it contains a pointer to an instance stored elsewhere.
When passed-by-value, its pointer is copied, so modifying a string argument should affect the shared instance.
However, a string instance has no mutable properties, so a string argument cannot be modified anyway.
When passed-by-reference, the pointer's container is shared, so reassignment will still affect the external scope.
Above answers are helpful, I'd just like to add an example that I think is demonstrating clearly what happens when we pass parameter without the ref keyword, even when that parameter is a reference type:
MyClass c = new MyClass(); c.MyProperty = "foo";
CNull(c); // only a copy of the reference is sent
Console.WriteLine(c.MyProperty); // still foo, we only made the copy null
CPropertyChange(c);
Console.WriteLine(c.MyProperty); // bar
private void CNull(MyClass c2)
{
c2 = null;
}
private void CPropertyChange(MyClass c2)
{
c2.MyProperty = "bar"; // c2 is a copy, but it refers to the same object that c does (on heap) and modified property would appear on c.MyProperty as well.
}
For curious minds and to complete the conversation:
Yes, String is a reference type:
unsafe
{
string a = "Test";
string b = a;
fixed (char* p = a)
{
p[0] = 'B';
}
Console.WriteLine(a); // output: "Best"
Console.WriteLine(b); // output: "Best"
}
But note that this change only works in an unsafe block! because Strings are immutable (From MSDN):
The contents of a string object cannot be changed after the object is
created, although the syntax makes it appear as if you can do this.
For example, when you write this code, the compiler actually creates a
new string object to hold the new sequence of characters, and that new
object is assigned to b. The string "h" is then eligible for garbage
collection.
string b = "h";
b += "ello";
And keep in mind that:
Although the string is a reference type, the equality operators (== and
!=) are defined to compare the values of string objects, not
references.
Try:
public static void TestI(ref string test)
{
test = "after passing";
}
I believe your code is analogous to the following, and you should not have expected the value to have changed for the same reason it wouldn't here:
public static void Main()
{
StringWrapper testVariable = new StringWrapper("before passing");
Console.WriteLine(testVariable);
TestI(testVariable);
Console.WriteLine(testVariable);
}
public static void TestI(StringWrapper testParameter)
{
testParameter = new StringWrapper("after passing");
// this will change the object that testParameter is pointing/referring
// to but it doesn't change testVariable unless you use a reference
// parameter as indicated in other answers
}
Another way to bypass the string behavior. Use string array of ONE element only and manipulate this element.
class Test
{
public static void Main()
{
string[] test = new string[1] {"before passing"};
Console.WriteLine(ref test);
TestI(test);
Console.WriteLine(ref test);
}
public static void TestI(ref string[] test)
{
test[0] = "after passing";
}
}
I want to write a 'Date' class that behaves like a Value Type.
for example, Instead of writing a Clone method for setting properties safely, make the Date class to pass by value:
public Date Birthday
{
get { return this.birthday; }
set
{
this.birthday = value.Clone();
} //I want to write this.birthday = value;
//without changing external value when this.Birthday changes
}
I know this is possible because System.String is a class and behaves like a value. for example:
String s1 = "Hello";
String s2 = "Hi";
s1 = s2;
s2="Hello";
Console.WriteLine(s1); //Prints 'Hi'
First I thought writers of this class override '=' operator, but now I know that the '=' operator can not be overridden. so how they write String class?
Edit: I just want to make my Date class to pass it's instances by value, like as String.
First, your string-based example does not illustrate your question.
The thing with DateTime and String is that they are immutable: once an instance is created, it cannot be changed in any way. For example, you cannot add 2 minutes to a DateTime instance by just saying date.Minutes += 2: you'll have to invoke date.AddMinutes(2), which will yield a totally new instance.
To make objects read-only, just follow the same pattern.
public class Date{ ...code...} would be a reference type...not what you want.
public struct Date { ...code...} would be a value type...probably what you want.
The string class is, as it is a class, a reference type...and is immutable..how being immutable effects the behavior of string objects can be confusing at the start.
Given string s1 = "Fish"; s1 is a reference that points to "Fish"...It is the "Fish" bit can never be changed....what s1 points to can be changed. If you then assign s1 = "Tuna"; "Fish" still exists but is no longer referenced and will be GC'd.
In your example after: s1=s2 s1,s2 now reference the same string "Hi"...there is only one "Hi".
I hope I have not gone way below your level.
It's not the '=' operator, it's the fact that when you say
stringThing = "thing";
you're creating a new string, not changing the current string to something else.
In c# , when sending a parameter to a method, when should we use "ref" and when "out" and when without any of them?
In general, you should avoid using ref and out, if possible.
That being said, use ref when the method might need to modify the value. Use out when the method always should assign something to the value.
The difference between ref and out, is that when using out, the compiler enforces the rule, that you need to assign something to the out paramter before returning. When using ref, you must assign a value to the variable before using it as a ref parameter.
Obviously, the above applies, when you are writing your own methods. If you need to call methods that was declared with the ref or out modifiers on their parameters, you should use the same modifier before your parameter, when calling the method.
Also remember, that C# passes reference types (classes) by reference (as in, the reference is passed by value). So if you provide some method with a reference type as a parameter, the method can modify the data of the object; even without ref or out. But it cannot modify the reference itself (as in, it cannot modify which object is being referenced).
They are used mainly to obtain multiple return values from a method call. Personally, I tend to not use them. If I want multiple return values from a method then I'll create a small class to hold them.
ref and out are used when you want something back from the method in that parameter. As I recall, they both actually compile down to the same IL, but C# puts in place some extra stuff so you have to be specific.
Here are some examples:
static void Main(string[] args)
{
string myString;
MyMethod0(myString);
Console.WriteLine(myString);
Console.ReadLine();
}
public static void MyMethod0(string param1)
{
param1 = "Hello";
}
The above won't compile because myString is never initialised. If myString is initialised to string.Empty then the output of the program will be a empty line because all MyMethod0 does is assign a new string to a local reference to param1.
static void Main(string[] args)
{
string myString;
MyMethod1(out myString);
Console.WriteLine(myString);
Console.ReadLine();
}
public static void MyMethod1(out string param1)
{
param1 = "Hello";
}
myString is not initialised in the Main method, yet, the program outputs "Hello". This is because the myString reference in the Main method is being updated from MyMethod1. MyMethod1 does not expect param1 to already contain anything, so it can be left uninitialised. However, the method should be assigning something.
static void Main(string[] args)
{
string myString;
MyMethod2(ref myString);
Console.WriteLine(myString);
Console.ReadLine();
}
public static void MyMethod2(ref string param1)
{
param1 = "Hello";
}
This, again, will not compile. This is because ref demands that myString in the Main method is initialised to something first. But, if the Main method is changed so that myString is initialised to string.Empty then the code will compile and the output will be Hello.
So, the difference is out can be used with an uninitialised object, ref must be passed an initialised object. And if you pass an object without either the reference to it cannot be replaced.
Just to be clear: If the object being passed is a reference type already then the method can update the object and the updates are reflected in the calling code, however the reference to the object cannot be changed. So if I write code like this:
static void Main(string[] args)
{
string myString = "Hello";
MyMethod0(myString);
Console.WriteLine(myString);
Console.ReadLine();
}
public static void MyMethod0(string param1)
{
param1 = "World";
}
The output from the program will be Hello, and not World because the method only changed its local copy of the reference, not the reference that was passed in.
I hope this makes sense. My general rule of thumb is simply not to use them. I feel it is a throw back to pre-OO days. (But, that's just my opinion)
(this is supplemental to the existing answers - a few extra considerations)
There is another scenario for using ref with C#, more commonly seen in things like XNA... Normally, when you pass a value-type (struct) around, it gets cloned. This uses stack-space and a few CPU cycles, and has the side-effect that any modifications to the struct in the invoked method are lost.
(aside: normally structs should be immutable, but mutable structs isn't uncommon in XNA)
To get around this, it is quite common to see ref in such programs.
But in most programs (i.e. where you are using classes as the default), you can normally just pass the reference "by value" (i.e. no ref/out).
Another very common use-case of out is the Try* pattern, for example:
string s = Console.ReadLine();
int i;
if(int.TryParse(s, out i)) {
Console.WriteLine("You entered a valid int: " + i);
}
Or similarly, TryGetValue on a dictionary.
This could use a tuple instead, but it is such a common pattern that it is reasonably understood, even by people who struggle with too much ref/out.
Very simple really. You use exactly the same keyword that the parameter was originally declared with in the method. If it was declared as out, you have to use out. If it was declared as ref, you have to use ref.
In addition to Colin's detailed answer, you could also use out parameters to return multiple values from one method call. See for example the method below which returns 3 values.
static void AssignSomeValues(out int first, out bool second, out string third)
{
first = 12 + 12;
second = false;
third = "Output parameters are okay";
}
You could use it like so
static void Main(string[] args) {
int i;
string s;
bool b;
AssignSomeValues(out i, out b, out s);
Console.WriteLine("Int value: {0}", i);
Console.WriteLine("Bool value: {0}", b);
Console.WriteLine("String value: {0}", s);
//wait for enter key to terminate program
Console.ReadLine(); }
Just make sure that you assign a valid value to each out parameter to avoid getting an error.
Try to avoid using ref. Out is okay, because you know what will happen, the old value will be gone and a new value will be in your variable even if the function failed. However, just by looking at the function you have no idea what will happen to a ref parameter. It may be the same, modified, or an entirely new object.
Whenever I see ref, I get nervous.
ref is to be avoided (I beleive there is an fx-cop rule for this also) however use ref when the object that is reference may itself changed. If you see the 'ref' keyword you know that the underlying object may no longer be referenced by the same variable after the method is called.
I've been going over and over this in my head, and I can't seem to come up with a good reason why C# closures are mutable. It just seems like a good way to get some unintended consequences if you aren't aware of exactly what's happening.
Maybe someone who is a little more knowledgeable can shed some light on why the designers of C# would allow state to change in a closure?
Example:
var foo = "hello";
Action bar = () => Console.WriteLine(foo);
bar();
foo = "goodbye";
bar();
This will print "hello" for the first call, but the outside state changes for the second call, printing "goodbye." The closure's state was updated to reflect the changes to the local variable.
C# and JavaScript, as well as O'Caml and Haskell, and many other languages, have what is known as lexical closures. This means that inner functions can access the names of local variables in the enclosing functions, not just copies of the values. In languages with immutable symbols, of course, such as O'Caml or Haskell, closing over names is identical to closing over values, so the difference between the two types of closure disappears; these languages nevertheless have lexical closures just like C# and JavaScript.
Not all closures behave the same. There are differences in semantics.
Note that the first idea presented matches C#'s behavior... your concept of closure semantics may not be the predominate concept.
As for reasons: I think the key here is ECMA, a standards group. Microsoft is just following their semantics in this case.
This is actually a fantastic feature. This lets you have a closure that accesses something normally hidden, say, a private class variable, and let it manipulate it in a controlled way as a response to something like an event.
You can simulate what you want quite easily by creating a local copy of the variable, and using that.
You have to also remember that in C# there is really no concept of immutable types. Because the whole objects in the .Net framework just don't get copied (you have to explicitly implement ICloneable, etc), this code would print "goodbye" even if the "pointer" foo was copied in the closure:
class Foo
{
public string Text;
}
var foo = new Foo();
foo.Text = "Hello";
Action bar = () => Console.WriteLine(foo.Text);
bar();
foo.Text = "goodbye";
bar();
So its questionable if in the current behaviour it is easier to get unintended consequences.
When you create a closure, the compiler creates a type for you that has members for each captured variable. In your example the compiler would generate something like this:
[CompilerGenerated]
private sealed class <>c__DisplayClass1
{
public string foo;
public void <Main>b__0()
{
Console.WriteLine(this.foo);
}
}
Your delegate is given a reference to this type so that it can use the captured variables later. Unfortunately, the local instance of foo is also changed to point here so any changes locally will affect the delegate as they use the same object.
As you can see the persistence of foo is handled by a public field rather than a property so there is not even an option of immutability here with the current implementation. I think what you want would have to be something like this:
var foo = "hello";
Action bar = [readonly foo]() => Console.WriteLine(foo);
bar();
foo = "goodbye";
bar();
Pardon the clumsy syntax but the idea is to denote that foo is captured in a readonly fashion which would then hint to the compiler to output this generated type:
[CompilerGenerated]
private sealed class <>c__DisplayClass1
{
public readonly string foo;
public <>c__DisplayClass1(string foo)
{
this.foo = foo;
}
public void <Main>b__0()
{
Console.WriteLine(this.foo);
}
}
This would give you what you wanted in a certain fashion but would require updates to the compiler.
In regards to why are closures mutable in C#, you have to ask, "Do you want simplicity (Java), or power with complexity (C#)?"
Mutable closures allow you to define once and reuse. Example:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ClosureTest
{
class Program
{
static void Main(string[] args)
{
string userFilter = "C";
IEnumerable<string> query = (from m in typeof(String).GetMethods()
where m.Name.StartsWith(userFilter)
select m.Name.ToString()).Distinct();
while(userFilter.ToLower() != "q")
{
DiplayStringMethods(query, userFilter);
userFilter = GetNewFilter();
}
}
static void DiplayStringMethods(IEnumerable<string> methodNames, string userFilter)
{
Console.WriteLine("Here are all of the String methods starting with the letter \"{0}\":", userFilter);
Console.WriteLine();
foreach (string methodName in methodNames)
Console.WriteLine(" * {0}", methodName);
}
static string GetNewFilter()
{
Console.WriteLine();
Console.Write("Enter a new starting letter (type \"Q\" to quit): ");
ConsoleKeyInfo cki = Console.ReadKey();
Console.WriteLine();
return cki.Key.ToString();
}
}
}
If you do not want to define once and reuse, because you are worried about unintended consequences, you can simply use a copy of the variable. Change the above code as follows:
string userFilter = "C";
string userFilter_copy = userFilter;
IEnumerable<string> query = (from m in typeof(String).GetMethods()
where m.Name.StartsWith(userFilter_copy)
select m.Name.ToString()).Distinct();
Now the query will return the same result, regardless of what userFilter equals.
Jon Skeet has an excellent introduction to the differences between Java and C# closures.