What the purpose of this = default(...) in struct constructor? - c#

During an research the purpose of this reassignment possibility with structs I ran into following puzzle: Why it is needed to do this = default(...) at the beginning of some struct constructor. It's actually zeroes already zeroed memory, isn't it?
See an example from .NET core:
public CancellationToken(bool canceled)
{
this = default(CancellationToken);
if (canceled)
{
this.m_source = CancellationTokenSource.InternalGetStaticSource(canceled);
}
}

It's actually zeroes already zeroed memory, isn't it?
No. When you create custom constructor for struct, it's your responsibility to set each and every field of struct to some value.
Just default .ctor() will fill it with zeroes - and that's one of the reasons why you are not allowed to implement own default .ctor for struct in C# (while CLR technically allows it, but it's a different topic to discuss).
So technique here is to call default(), which will create separate instance filled with zeroes, when using 'this = ' assignment will copy all fields from right to left (as it is struct), which will satisfy you need to initialize every field of struct. Then, you can do whatever you want.
In most cases it might be better (from code readability) to use something like this instead:
public MyStruct(..) : this() {
...
}

It's meeting a requirement for struct constructors:
If the struct instance constructor doesn’t specify a constructor initializer, the this variable corresponds to an out parameter of the struct type, and similar to an out parameter, this must be definitely assigned (§5.3) at every location where the constructor returns
C# Language Spec, Version 5, section 11.3.8
Of course, it doesn't have to be default() - it could be anything. But this must be assigned.
The rules for definite assignment (from section 5.3 referenced above) say, in part:
The definite assignment states of instance variables of a struct-type variable are tracked individually as well as collectively. In additional to the rules above, the following rules apply to struct-type variables and their instance variables:
An instance variable is considered definitely assigned if its containing struct-type variable is considered definitely assigned.
A struct-type variable is considered definitely assigned if each of its instance variables is considered definitely assigned.

Related

modify a value-type variable in a using statement

In C#, if I have the following struct:
internal struct myStruct : IDisposable
{
public int x;
public void Dispose()
{
x = 0;
}
}
then do this in Main:
using (myStruct myStruct = new myStruct())
{
myStruct.x = 5;
}
it fails saying that myStruct is readonly. That makes sense as myStruct is a value-type.
Now if I add the folling function to the struct:
public void myfunc(int x)
{
this.x = x;
}
and change the Main code to this:
using (myStruct myStruct = new myStruct())
{
myStruct.myfunc(5);
Console.WriteLine(myStruct.x);
}
it works. Why ?
The short answer is "because the C# specification says so". Which, I admit, may be a bit unsatisfying. But that's how it is.
The motivation is, I'm sure, as commenter Blogbeard suggests: while it's practical to enforce read-only on the field access, it's not practical to do so from within a type. After all, the type itself has no way to know how a variable containing a value of that type was declared.
The key part of the C# specification (from the v5.0 spec) is here, on page 258 (in the section on the using statement):
Local variables declared in a resource-acquisition are read-only, and must include an initializer. A compile-time error occurs if the embedded statement attempts to modify these local variables (via assignment or the ++ and operators), take the address of them, or pass them as ref or out parameters.
Since in the case of a value type, the variable itself contains the value of the object rather than a reference to an object, modifying any field of the object via that variable is the same as modifying the variable, and is so a "modification via assignment", which is specifically prohibited by the specification.
This is exactly the same as if you had declared the value type variable as a field in another object, with the readonly modifier.
But note that this is a compile-time rule, enforced by the C# compiler, and that there's no way for the compiler to similarly enforce the rule for a value type that modifies itself.
I will point out that this is one of many excellent reasons that one should never ever implement a mutable value type. Mutable value types frequently wind up being able to be modified when you don't want them to be, while at the same time find themselves failing to be modified when you do want them to be (in completely different scenarios from this one).
If you treat a value type as something that is truly a value, i.e. a single value that is itself never changing, they work much better and find themselves in the middle of many fewer bugs. :)

What does .Net do when you declare an object without an instance?

I wonder to know how the .Net Framework handles the declared but not instantiated object situation.
For example i declare an object like
DropDownList ddl;
and do nothing about it. I know that i should do something with this variable and get a warning about it, but what i don't know is the where it will be stored.
Is there a lookup table that stores the data of all declared variables? Or is there a virtual reference for every declaration?
Edit : I just wanted to know how the memory allocated for this object declaration.
Edit2 : Whether it's a local variable or not, i'm just talking about the memory allocation structure. I wonder to know where this references stored.
If ddl is a field, then the value of ddl will be null, as it is a reference type.
Any attempt to call a member on it will result in a NullReferenceException.
If it is a local variable it will simply be unassigned.
Value types will get the default(T) of their type.
The compiler itself may remove the call completely, depending on where it was declared, but this is an implementation detail.
If you are talking about a local variable then the compiler can simply optimize it out of existence since noone can be using it (if you attempted to use it without initializing the compiler would have protested with an error). In fact the .NET 4 compiler did this for me when I tested just moments ago.
If you are talking about a field in a class then it is initialized with the default value for its type as part of the object construction.
From your description, it sounds like you're talking about a local variable. When you declare a local variable in usual implementations and without any optimizations, then space is reserved for it on the stack (most probably), with a null reference as its value.
You could look into the StackFrame class if you want to inspect further (I've never used it).
The variable is stored in your assembly. It will always have it's default value null.
In release mode (compiler is set to optimize) it's optimized and it is not stored anywhere.
If you want to know more about IL and how the compiler works, wikipedia has a good article to start.
All variables are stored into a class or method. Variables declared into a class can be listed using .NET Reflection :
class Class1 { private int i; public string s; }
typeof(Class1).GetFields(BindingFlags.Instance); // returns all instance fields
typeof(Class1).GetFields(); // returns all instance public fields
typeof(Class1).GetProperties(); // returns all instance public properties
Variables declared into a method cannot be inspected with .NET Reflection mechanisms.

Changing the 'this' variable of value types

Apparently you can change the this value from anywhere in your struct (but not in classes):
struct Point
{
public Point(int x, int y)
{
this = new Point();
X = x; Y = y;
}
int X; int Y;
}
I've neither seen this before nor ever needed it. Why would one ever want to do that? Eric Lippert reminds us that a feature must be justified to be implemented. What great use case could justify this? Are there any scenarios where this is invaluable? I couldn't find any documentation on it1.
Also, for calling constructors there is already a better known alternative syntax, so this feature is sometimes redundant:
public Point(int x, int y)
: this()
{
X = x; Y = y;
}
I found this feature in an example in Jeffrey Richter's CLR via C# 4th edition.
1) Apparently it is in the C# specification.
Good question!
Value types are, by definition, copied by value. If this was not actually an alias to a storage location then the constructor would be initializing a copy rather than initializing the variable you intend to initialize. Which would make the constructor rather less useful! And similarly for methods; yes, mutable structs are evil but if you are going to make a mutable struct then again, this has to be the variable that is being mutated, not a copy of its value.
The behaviour you are describing is a logical consequence of that design decision: since this aliases a variable, you can assign to it, same as you can assign to any other variable.
It is somewhat odd to assign directly to this like that, rather than assigning to its fields. It is even more odd to assign directly to this and then overwrite 100% of that assignment!
An alternative design which would avoid making this an alias to the receiver's storage would be to allocate this off the short-term storage pool, initialize it in the ctor, and then return it by value. The down side of that approach is that it makes copy elision optimizations pretty much impossible, and it makes ctors and methods weirdly inconsistent.
Also, I couldn't find any documentation on it.
Did you try looking in the C# spec? Because I can find documentation on it (7.6.7):
When this is used in a primary-expression within an instance constructor of a struct, it is classified as a variable. The type of the variable is the instance type (§10.3.1) of the struct within which the usage occurs, and the variable represents the struct being constructed. The this variable of an instance constructor of a struct behaves exactly the same as an out parameter of the struct type—in particular, this means that the variable must be definitely assigned in every execution path of the instance constructor.
When this is used in a primary-expression within an instance method or instance accessor of a struct, it is classified as a variable. The type of the variable is the instance type (§10.3.1) of the struct within which the usage occurs.
If the method or accessor is not an iterator (§10.14), the this variable represents the struct for which the method or accessor was invoked, and behaves exactly the same as a ref parameter of the struct type.
If the method or accessor is an iterator, the this variable represents a copy of the struct for which the method or accessor was invoked, and behaves exactly the same as a value parameter of the struct type.
As to a use case for it, I can't immediately think of many - about the only thing I've got is if the values you want to assign in the constructor are expensive to compute, and you've got a cached value you want to copy into this, it might be convenient.
A storage location of value type in an aggregation of storage locations comprising that type's public and private fields. Passing a value type an an ordinary (value) parameter will physically and semantically pass the contents of all its fields. Passing a value type as a ref parameter is semantically pass the contents of all its fields, though a single "byref" is used to pass all of them.
Calling a method on a struct is equivalent to passing the struct (and thus all its fields) as a ref parameter, except for one wrinkle: normally, neither C# nor vb.net will allow a read-only value to be passed as a ref parameter. Both, however, will allow struct methods to be invoked on read-only values or temporary values. They do this by making a copy of all the struct (and thus all of its fields), and then passing that copy as a ref parameter.
Because of this behavior, some people call mutable structs "evil", but the only thing that's evil is the fact that neither C# or vb.net defines any attribute to indicate whether a struct member or property should be invokable on things that can't be directly passed by ref.

When would a value type contain a reference type?

I understand that the decision to use a value type over a reference type should be based on the semantics, not performance. I do not understand why value types can legally contain reference type members? This is for a couple reasons:
For one, we should not build a struct to require a constructor.
public struct MyStruct
{
public Person p;
// public Person p = new Person(); // error: cannot have instance field initializers in structs
MyStruct(Person p)
{
p = new Person();
}
}
Second, because of value type semantics:
MyStruct someVariable;
someVariable.p.Age = 2; // NullReferenceException
The compiler does not allow me to initialize Person at the declaration. I have to move this off to the constructor, rely on the caller, or expect a NullReferenceException. None of these situations are ideal.
Does the .NET Framework have any examples of reference types within value types? When should we do this (if ever)?
Instances of a value type never contain instances of a reference type. The reference-typed object is somewhere on the managed heap, and the value-typed object may contain a reference to the object. Such a reference has a fixed size. It is perfectly common to do this — for example every time you use a string inside a struct.
But yes, you cannot guarantee initialization of a reference-typed field in a struct because you cannot define a parameter-less constructor (nor can you guarantee it ever gets called, if you define it in a language other than C#).
You say you should "not build a struct to require a constructor". I say otherwise. Since value-types should almost always be immutable, you must use a constructor (quite possibly via a factory to a private constructor). Otherwise it will never have any interesting contents.
Use the constructor. The constructor is fine.
If you don't want to pass in an instance of Person to initialize p, you could use lazy initialization via a property. (Because obviously the public field p was just for demonstration, right? Right?)
public struct MyStruct
{
public MyStruct(Person p)
{
this.p = p;
}
private Person p;
public Person Person
{
get
{
if (p == null)
{
p = new Person(…); // see comment below about struct immutability
}
return p;
}
}
// ^ in most other cases, this would be a typical use case for Lazy<T>;
// but due to structs' default constructor, we *always* need the null check.
}
There are two primary useful scenarios for a struct holding a class-type field:
The struct holds a possibly-mutable reference to an immutable object (`String` being by far the most common). An reference to an immutable object will behave as a cross between a nullable value type and a normal value type; it doesn't have the "Value" and "HasValue" properties of the former, but it will have null as a possible (and default) value. Note that if the field is accessed through a property, that property may return a non-null default when the field is null, but should not modify the field itself.
The struct holds an "immutable" reference to a possibly-mutable object and serves to wrap the object or its contents. `List.Enumerator` is probably the most common struct using this pattern. Having struct fields pretend to be immutable is something of a dodgy construct(*), but in some contexts it can work out pretty well. In most instances where this pattern is applied, the behavior of a struct will be essentially like that of a class, except that performance will be better(**).
(*) The statement structVar = new structType(whatever); will create a new instance of structType, pass it to the constructor, and then mutate structVar by copying all public and private fields from that new instance into structVar; once that is done, the new instance will be discarded. Consequently, all struct fields are mutable, even if they "pretend" to be otherwise; pretending they are immutable can be dodgy unless one knows that the way structVar = new structType(whatever); is actually implemented will never pose a problem.
(**) Structs will perform better in some circumstances; classes will perform better in others. Generally, so-called "immutable" structs are chosen over classes in situations where they are expected to perform better, and where the corner cases where their semantics would differ from those of classes are not expected to pose problems.
Some people like to pretend that structs are like classes, but more efficient, and dislike using structs in ways that take advantage of the fact that they're not classes. Such people would probably only be inclined toward using scenario (2) above. Scenario #1 can be very useful with mutable structs, especially with types like String which behave essentially as values.
I wanted to add to Marc's answer, but I had too much to say for a comment.
If you look at the C# specifications, it says of struct constructors:
Struct constructors are invoked with the new operator, but that does
not imply that memory is being allocated. Instead of dynamically
allocating an object and returning a reference to it, a struct
constructor simply returns the struct value itself (typically in a
temporary location on the stack), and this value is then copied as
necessary.
(You can find a copy of the spec under
C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC#\Specifications\1033)
So, a struct constructor is inherently different than a class constructor.
In addition to this, structs are expected to be copied by value, and thus:
With structs, the variables each have their own copy of the data, and
it is not possible for operations on one to affect the other.
Any time I've seen a reference type in a struct, it has been a string. This works because strings are immutable. I am guessing your Person object is not immutable and can introduce very odd and severe bugs because of the divergence from the expected behavior of a struct.
That being said, the errors you're seeing with the constructor of your struct may be that you have a public field p with the same name as your parameter p and not referring to the struct's p as this.p, or that you're missing the keyword struct.

const, readonly and mutable value types

I'm continuing my study of C# and the language specification and Here goes another behavior that I don't quite understand:
The C# Language Specification clearly states the following in section 10.4:
The type specified in a constant declaration must be sbyte, byte, short, ushort, int, uint, long, ulong, char, float, double, decimal, bool, string, an enum-type, or a reference-type.
It also states in section 4.1.4 the following:
Through const declarations it is possible to declare constants of the simple types (§10.4). It is not possible to have constants of other struct types, but a similar effect is provided by static readonly fields.
Ok, so a similar effect can be gained by using static readonly. Reading this I went and tried the following code:
static void Main()
{
OffsetPoints();
Console.Write("Hit a key to exit...");
Console.ReadKey();
}
static Point staticPoint = new Point(0, 0);
static readonly Point staticReadOnlyPoint = new Point(0, 0);
public static void OffsetPoints()
{
PrintOutPoints();
staticPoint.Offset(1, 1);
staticReadOnlyPoint.Offset(1, 1);
Console.WriteLine("Offsetting...");
Console.WriteLine();
PrintOutPoints();
}
static void PrintOutPoints()
{
Console.WriteLine("Static Point: X={0};Y={1}", staticPoint.X, staticPoint.Y);
Console.WriteLine("Static readonly Point: X={0};Y={1}", staticReadOnlyPoint.X, staticReadOnlyPoint.Y);
Console.WriteLine();
}
The output of this code is:
Static Point: X=0;Y=0
Static readonly Point: X=0;Y=0
Offsetting...
Static Point: X=1;Y=1
Static readonly Point: X=0;Y=0
Hit a key to exit...
I really expected the compiler to give me some kind of warning about mutating a static readonly field or failing that, to mutate the field as it would with a reference type.
I know mutable value types are evil (why did Microsoft ever implement Point as mutable is a mystery) but shouldn't the compiler warn you in some way that you are trying to mutate a static readonly value type? Or at least warn you that your Offset() method will not have the "desired" side effects?
Eric Lippert explains what's going on here:
...if the field is readonly and the reference occurs outside an
instance constructor of the class in which the field is declared, then
the result is a value, namely the value of the field I in the object
referenced by E.
The important word here is that the result is the value of the field,
not the variable associated with the field. Readonly fields are not
variables outside of the constructor. (The initializer here is
considered to be inside the constructor; see my earlier post on that
subject.)
Oh and just to stress on the evilness of mutable structs, here is his conclusion:
This is yet another reason why mutable value types are evil. Try to
always make value types immutable.
The point of the readonly is that you cannot reassign the reference or value.
In other words if you attempted this
staticReadOnlyPoint = new Point(1, 1);
you would get a compiler error because you are attempting to reassign staticReadOnlyPoint. The compiler will prevent you from doing this.
However, readonly doesn't enforce whether the value or referenced object itself is mutable - that is a behaviour that is designed into the class or struct by the person creating it.
[EDIT: to properly address the odd behaviour being described]
The reason you see the behaviour where staticReadOnlyPoint appears to be immutable is not because it is immutable itself, but because it is a readonly struct. This means that every time you access it, you are taking a full copy of it.
So your line
staticReadOnlyPoint.Offset(1, 1);
is accessing, and mutating, a copy of the field, not the actual value in the field. When you subsequently write out the value you are then writing out yet another copy of the original (not the mutated copy).
The copy you did mutate with the call to Offset is discarded, because it is never assigned to anything.
The compiler simply doesn't have enough information available about a method to know that the method mutates the struct. A method may well have a side-effect that's useful but doesn't otherwise change any members of the struct. If would technically be possible to add such analysis to the compiler. But that won't work for any types that live in another assembly.
The missing ingredient is a metadata token that indicates that a method doesn't mutate any members. Like the const keyword in C++. Not available. It would have be drastically non-CLS compliant if it was added in the original design. There are very few languages that support the notion. I can only think of C++ but I don't get out much.
Fwiw, the compiler does generate explicit code to ensure that the statement cannot accidentally modify the readonly. This statement
staticReadOnlyPoint.Offset(1, 1);
gets translated to
Point temp = staticReadOnlyPoint; // makes a copy
temp.Offset(1, 1);
Adding code that then compares the value and generates a runtime error is also only technically possible. It costs too much.
The observed behavior is an unfortunate consequence of the fact that neither the Framework nor C# provides any means by which member function declarations can specify whether this should be passed by ref, const-ref, or value. Instead, value types always pass this by (non-const-restricted) ref, and reference types always pass this by value.
The 'proper' behavior for a compiler would be to forbid passing immutable or temporary values by non-const-restricted ref. If such restriction could be imposed, ensuring proper semantics for mutable value types would mean following a simple rule: if you make an implicit copy of a struct, you're doing something wrong. Unfortunately, the fact that member functions can only accept this by non-const-restricted ref means a language designer must make one of three choices:
Guess that a member function won't modify `this`, and simply pass immutable or temporary variables by `ref`. This would be most efficient for functions which do not, in fact, modify `this`, but could dangerously expose to modification things that should be immutable.
Don't allow member functions to be used on immutable or temporary entities. This would avoid improper semantics, but would be a really annoying restriction, especially given that most member functions do not modify `this`.
Allow the use of member functions except those deemed most likely to modify `this` (e.g. property setters), but instead of passing immutable entities directly by ref, copy them to temporary locations and pass those.
Microsoft's choice protects constants from improper modification, but has the unfortunate consequences that code will run needlessly slowly when calling functions that don't modify this, while generally working incorrectly for those which do.
Given the way this is actually handled, one's best bet is to avoid making any changes to it in structure member functions other than property setters. Having property setters or mutable fields is fine, since the compiler will correctly forbid any attempt to use property setters on immutable or temporary objects, or to modify any fields thereof.
If you look at the IL, you will see that on usage of the readonly field, a copy is made before calling Offset:
IL_0014: ldsfld valuetype [System.Drawing]System.Drawing.Point
Program::staticReadOnlyPoint
IL_0019: stloc.0
IL_001a: ldloca.s CS$0$0000
Why this is happening, is beyond me.
It could be part of the spec, or a compiler bug (but it looks a bit too intentional for the latter).
The effect is due to several well-defined features coming together.
readonly means that the field in question cannot be changed, but not that the target of the field cannot be changed. This is more easily understood (and more often useful in practice) with readonly fields of a mutable reference type, where you can do x.SomeMutatingMethod() but not x = someNewObject.
So, first item is; you can mutate the target of a readonly field.
Second item is, that when you access a non-variable value type you obtain a copy of the value. The least confusing example of this is giveMeAPoint().Offset(1, 1) because there isn't a known location for us to later observe that the value-type returned by giveMeAPoint() may or may not have been mutated.
This is why value types are not evil, but are in some ways worse. Truly evil code doesn't have a well-defined behaviour, and all of this is well-defined. It's still confusing though (confusing enough for me to get this wrong on my first answer), and confusing is worse than evil when you're trying to code. Easily understood evil is so much more easily avoided.

Categories

Resources