Why anonymous methods inside structs can not access instance members of 'this' - c#

I have a code like the following:
struct A
{
void SomeMethod()
{
var items = Enumerable.Range(0, 10).Where(i => i == _field);
}
int _field;
}
... and then i get the following compiler error:
Anonymous methods inside structs can not access instance members of 'this'.
Can anybody explains what's going on here.

Variables are captured by reference (even if they were actually value-types; boxing is done then).
However, this in a ValueType (struct) cannot be boxed, and hence you cannot capture it.
Eric Lippert has a nice article on the surprises of capturing ValueTypes. Let me find the link
The Truth About Value Types
Note in response to the comment by Chris Sinclair:
As a quick fix, you can store the struct in a local variable: A thisA = this; var items = Enumerable.Range(0, 10).Where(i => i == thisA._field); – Chris Sinclair 4 mins ago
Beware of the fact that this creates surprising situations: the identity of thisA is not the same as this. More explicitly, if you choose to keep the lambda around longer, it will have the boxed copy thisA captured by reference, and not the actual instance that SomeMethod was called on.

When you have an anonymous method it will be compiled into a new class, that class will have one method (the one you define). It will also have a reference to each variable that you used that was outside of the scope of the anonymous method. It's important to emphasize that it is a reference, not a copy, of that variable. "lambdas close over variables, not values" as the saying goes. This means that if you close over a variable outside of the scope of a lambda, and then change that variable after defining the anonymous method (but before invoking it) then you will see the changed value when you do invoke it).
So, what's the point of all of that. Well, if you were to close over this for a struct, which is a value type, it's possible for the lambda to outlive the struct. The anonymous method will be in a class, not a struct, so it will go on the heap, live as long as it needs to, and you are free to pass a reference to that class (directly or indirectly) wherever you want.
Now imagine that we have a local variable, with a struct of the type you've defined here. We use this named method to generate a lambda, and let's assume for a moment that the query items is returned (instead of the method being void). Would could then store that query in another instance (instead of local) variable, and iterate over that query some time later on another method. What would happen here? In essence, we would have held onto a reference to a value type that was on the stack once it is no longer in scope.
What does that mean? The answer is, we have no idea. (Please look over the link; it's kinda the crux of my argument.) The data could just happen to be the same, it could have been zeroed out, it could have been filled by entirely different objects, there is no way of knowing. C# goes to great lengths, as a language, to prevent you from doing things like this. Languages such as C or C++ don't try so hard to stop you from shooting your own foot.
Now, in this particular case, it's possible that you aren't going to use the lambda outside of the scope of what this refers to, but the compiler doesn't know that, and if it lets you create the lambda it has no way of determining whether or not you expose it in a way that could result in it outliving this, so the only way to prevent this problem is to disallow some cases that aren't actually problematic.

Related

Performance/style: Changing an object by reference vs returning a copy in C#

First let me provide context:
In C# an object passed into a method is passed via reference. The reference is only lost if the passed in object is re-instantiated with the keyword new
So, I like to do things like var obj = Alter(obj)(method 1) i.e. I pass in an object and return the object. As opposed to doing the equivalent: Alter(obj) (method 2) where the referenced object is changed the same, except by reference instead of returning a copy. I would argue the first one is better since if some daredevil coder later modifies the code to use a keyword "new"... existing code won't burn and die.
My question is will method 1 use significantly more memory than method 2 or will it cause any other performance degradation? i.e. will this invoke the GC more often?
The answer is NO
In C# an object passed into a method is passed via reference. The reference is only lost if the passed in object is re-instantiated with the keyword new
No and no, not by default at least. By default everything is passed by value. It just so happens that, in the case of reference types, the thing being passed by value is a reference.
So, a copy of the reference is made. This also disproves the second statement. You can reassign the method argument all you like; you are simply modifying a copy. This also changes the meaning of your question, because you go on to say...
So, I like to do things like var obj = Alter(obj)(method 1)... I would argue the first one is better since if some daredevil coder later modifies the code to use a keyword "new"... existing code won't burn and die.
That situation will not occur. Secondly, if you work with programmers who check in code that flat out doesn't work and that they didn't test, you have a bigger problem. However, "using the new keyword" on the reference copy is irrelevant anyway (at least, in terms of affecting the original). Even if you were correct in your approach this would be overly defensive.
My question to you is; if you have functions which serve only to mutate the state of its single input, then why isn't this method an instance method of the type to begin with?
C# never copies a reference type. If you pass in obj to your method and then return it, that is the same exact object instance you started with.
That does not create additional pressure for the GC.
In general, it's safer to not modify parameters that are specified as input to a method. As for performance, memory consumption difference between these two are almost certainly going to be negligible, and definitely won't be the performance bottleneck in your program. It's a case of pre-mature optimization.
You should choose the cleaner, safer solution, unless you have evidence that the performance difference is causing a problem in your program.

Do delegates create thread safety issues with local variables?

I am initializing a mutable class instance as a local variable with new keyword. Then I pass this object as a parameteter to a delegate. Is this variable's lifetime extended by the delegate? Do other threads use this variable or create their own instances? I may be asking the obvious but I want to be sure.
public void DoSometing(Action<Foo> action)
{
Foo foo = new Foo();
action.Invoke(foo);
}
Whenever you pass local variables that "escape" the method one way or another, you do extend its lifetime. In C# you will never operate upon a variable that contains a reference to a non-existant object -- the concept makes no sense in a managed environment.
So yes, foo will continue to live on, and you will need to be concerned with thread-safety in exactly the same way as if you simply called another ordinary method. In this scenario, lambdas do not change the complexion of the problem.
However, sometimes this can be more subtle, especially if you return a lambda -- one which closes over local variables. In such a scenario, all the variables you reference from within the lambda live on in the same way as foo.

What's wrong with this C# struct?

Note: My question has several parts to it. I'd appreciate it if you would please answer each of the questions, instead of simply telling me what to do to get this to compile. :)
I'm not by any means good with C#. In fact, the reason why I don't know much about it is my class is focused on making efficient Algorithms and not really on teaching us .NET. Nevertheless all of our programs must be written in .NET and it hasn't been a problem until just now. I have the following code, but it won't compile and I don't really understand why. I have a gut feeling that this should be rewritten altogether, but before I do that, I want to know WHY this isn't allowed.
The point of the struct is to create a linked list like structure so I can add another node to the end of the "list" and then traverse and recall the nodes in reverse order
private struct BackPointer
{
public BackPointer previous;
public string a;
public string b;
public BackPointer(BackPointer p, string aa, string bb)
{
previous = p;
a = aa;
b = bb;
}
}
then later in my code I have something to the effect of
BackPointer pointer = new BackPointer();
pointer = new BackPointer(pointer, somestring_a, somestring_b);
The compile error I'm getting is Struct member 'MyClass.BackPointer.previous' of type 'MyClass.BackPointer' causes a cycle in the struct layout
This seems to be an obvious error. It doesn't like the fact that I am passing in the struct in the constructor of the same struct. But why is that not allowed? I would imagine this code would just create a new node in the list and return this node with a pointer back to the previous node, but apparently that's not what would happen. So what would actually happen then? Lastly what is the recommended way to resolve this? I was thinking to just tell it to be unmanaged just handle my pointers manually, but I only really know how to do that in C++. I don't really know what could go wrong in C#
That's not a pointer; it's an actual embedded struct value.
The whole point of structs is that they're (almost) never pointers.
You should use a class instead.
But why is that not allowed?
It's a struct - a value type. That means wherever you've got a variable of that type, that variable contains all the fields within the struct, directly inline. If something contains itself (or creates a more complicated cycle) then you clearly can't allocate enough space for it - because it's got to have enough space for all its fields and another copy of itself.
Lastly what is the recommended way to resolve this?
Write a class instead of a struct. Then the value of the variable will be a reference to an instance, not the data itself. That's how you get something close to "a pointer" in C#. (Pointers and references are different, mind you.)
I suggest you read my article on value types and reference types for more information - this is an absolutely critical topic to understand in C#.
Backpointer HAS to exist before creating a Backpointer, because you can't have a Backpointer without another Backpointer (which would then need another Backpointer and on and on). You simply can't create a Backpointer based on the way you've created it, because, as a struct, Backpointer can never be null.
In other words, it's impossible to create a Backpointer with this code. The compiler knows that, and so it forces you to make something that would work logically.
Structs are stored by value. In this case, your struct stores within itself another instance of the same struct. That struct stores within itself another struct and so on. Therefore this is impossible. It is like saying that every person in the world must have 1 child. There is no way this is possible.
What you need to use is a class. Classes store by reference, which means that it does not store the class within itself, it only stores a reference to that class.
A CLR struct is by definition a value type. What this means in your context is that the compiler needs to know the exact layout of the type. However, it cannot know how to layout a type which contains an instance of itself - does that sound reasonable? Change the struct to class (which makes your BackPointer to a reference type) and you'll see it's gonna work out of the box. The reason is that an instance of any reference type has always has the same layout - it is basically just a "pointer" to some location of the managed heap. I strongly recommend to read on a bit about the basics of C# or CLI type system.

Are arrays or lists passed by default by reference in c#?

Do they? Or to speed up my program should I pass them by reference?
The reference is passed by value.
Arrays in .NET are object on the heap, so you have a reference. That reference is passed by value, meaning that changes to the contents of the array will be seen by the caller, but reassigning the array won't:
void Foo(int[] data) {
data[0] = 1; // caller sees this
}
void Bar(int[] data) {
data = new int[20]; // but not this
}
If you add the ref modifier, the reference is passed by reference - and the caller would see either change above.
They are passed by value (as are all parameters that are neither ref nor out), but the value is a reference to the object, so they are effectively passed by reference.
Yes, they are passed by reference by default in C#. All objects in C# are, except for value types. To be a little bit more precise, they're passed "by reference by value"; that is, the value of the variable that you see in your methods is a reference to the original object passed. This is a small semantic point, but one that can sometimes be important.
(1) No one explicitly answered the OP's question, so here goes:
No. Explicitly passing the array or list as a reference will not affect performance.
What the OP feared might be happening is avoided because the function is already operating on a reference (which was passed by value). The top answer nicely explains what this means, giving an Ikea way to answer the original question.
(2) Good advice for everyone:
Read Eric Lippert's advice on when/how to approach optimization. Premature optimization is the root of much evil.
(3) Important, not already mentioned:
Use cases that require passing anything - values or references - by reference are rare.
Doing so gives you extra ways to shoot yourself in the foot, which is why C# makes you use the "ref" keyword on the method call as well. Older (pre-Java) languages only made you indicate pass-by-reference on the method declaration. And this invited no end of problems. Java touts the fact that it doesn't let you do it at all.

Why can '=' not be overloaded in C#?

I was wondering, why can't I overload '=' in C#? Can I get a better explanation?
Memory managed languages usually work with references rather than objects. When you define a class and its members you are defining the object behavior, but when you create a variable you are working with references to those objects.
Now, the operator = is applied to references, not objects. When you assign a reference to another you are actually making the receiving reference point to the same object that the other reference is.
Type var1 = new Type();
Type var2 = new Type();
var2 = var1;
In the code above, two objects are created on the heap, one referred by var1 and the other by var2. Now the last statement makes the var2 reference point to the same object that var1 is referring. After that line, the garbage collector can free the second object and there is only one object in memory. In the whole process, no operation is applied to the objects themselves.
Going back to why = cannot be overloaded, the system implementation is the only sensible thing you can do with references. You can overload operations that are applied to the objects, but not to references.
If you overloaded '=' you would never be able to change an object reference after it's been created.
... think about it - any call to theObjectWithOverloadedOperator=something inside the overloaded operator would result in another call to the overloaded operator... so what would the overloaded operator really be doing ? Maybe setting some other properties - or setting the value to a new object (immutability) ?
Generally not what '=' implies..
You can, however, override the implicit & explicit cast operators:
http://www.blackwasp.co.uk/CSharpConversionOverload.aspx
Because it doesn't really make sense to do so.
In C# = assigns an object reference to a variable. So it operates on variables and object references, not objects themselves. There is no point in overloading it depending on object type.
In C++ defining operator= makes sense for classes whose instances can be created e.g. on stack because the objects themselves are stored in variables, not references to them. So it makes sense to define how to perform such assignment. But even in C++, if you have set of polymorphic classes which are typically used via pointers or references, you usually explicitly forbid copying them like this by declaring operator= and copy constructor as private (or inheriting from boost::noncopyable), because of exactly the same reasons as why you don't redefine = in C#. Simply, if you have reference or pointer of class A, you don't really know whether it points to an instance of class A or class B which is a subclass of A. So do you really know how to perform = in this situation?
Actually, overloading operator = would make sense if you could define classes with value semantics and allocate objects of these classes in the stack. But, in C#, you can't.
One possible explanation is that you can't do proper reference updates if you overload assignment operator. It would literally screw up semantics because when people would be expecting references to update, your = operator may as well be doing something else entirely. Not very programmer friendly.
You can use implicit and explicit to/from conversion operators to mitigate some of the seeming shortcomings of not able to overload assignment.
I don't think there's any really particular single reason to point to. Generally, I think the idea goes like this:
If your object is a big, complicated object, doing something that isn't assignment with the = operator is probably misleading.
If your object is a small object, you may as well make it immutable and return new copies when performing operations on it, so that the assignment operator works the way you expect out of the box (as System.String does.)
You can overload assignment in C#. Just not on an entire object, only on members of it. You declare a property with a setter:
class Complex
{
public double Real
{
get { ... }
set { /* do something with value */ }
}
// more members
}
Now when you assign to Real, your own code runs.
The reason assignment to an object is not replaceable is because it is already defined by the language to mean something vitally important.
It's allowed in C++ and if not careful , it can result in a lot of confusion and bug hunting.
This article explains this in great detail.
http://www.relisoft.com/book/lang/project/14value.html
Because shooting oneself in the foot is frowned upon.
On a more serious note one can only hope you meant comparison rather than assignment. The framework makes elaborate provision for interfering with equality/equivalence evaluation, look for "compar" in help or online with msdn.
Being able to define special semantics for assignment operations would be useful, but only if such semantics could be applied to all situations where one storage location of a given type was copied to another. Although standard C++ implements such assignment rules, it has the luxury of requiring that all types be defined at compile time. Things get much more complicated when Reflection and and generics are added to the list.
Presently, the rules in .net specify that a storage location may be set to the default value for its type--regardless of what that type is--by zeroing out all the bytes. They further specify that any storage location can be copied to another of the same type by copying all the bytes. These rules apply to all types, including generics. Given two variables of type KeyValuePair<t1,t2>, the system can copy one to another without having to know anything but the size and alignment requirements of that type. If it were possible for t1, t2, or the type of any field within either of those types, to implement a copy constructor, code which copied one struct instance to another would have to be much more complicated.
That's not to say that such an ability offer some significant benefits--it's possible that, were a new framework being designed, the benefits of custom value assignment operators and default constructors would exceed the costs. The costs of implementation, however, would be substantial in a new framework, and likely insurmountable for an existing one.
This code is working for me:
public class Class1
{
...
public static implicit operator Class1(Class2 value)
{
Class1 result = new Class1();
result.property = value.prop;
return result;
}
}
Type of Overriding Assignment
There are two type to Override Assignment:
When you feel that user may miss something, and you want force user to use 'casting'
like float to integer, when you loss the floating value
int a = (int)5.4f;
When you want user to do that without even notice that s/he changing the object type
float f = 5;
How to Override Assignment
For 1, use of explicit keyword:
public static explicit override ToType(FromType from){
ToType to = new ToType();
to.FillFrom(from);
return to;
}
For 2, use of implicit keyword:
public static implicit override ToType(FromType from){
ToType to = new ToType();
to.FillFrom(from);
return to;
}
Update:
Note: that this implementation can take place in either the FromType or ToType class, depending on your need, there's no restriction, one of your class can hold all the conversions, and the other implements no code for this.

Categories

Resources