What exactly is a reference in C#

What exactly is a reference in C# - c#

From what I understand by now, I can say that a reference in C# is a kind of pointer to an object which has reference count and knows about the type compatibility. My question is not about how a value type is different than a reference type, but more about how a reference is implemented.
I have read this post about what differences are between references and pointers, but that does not cover that much about what a reference is but it it's describing more it's properties compared with a pointer in C++. I also understand the differences between passing by reference an passing by value (as in C# objects are by default passed by value, even references), but it is hard for me to understand what really is a reference when I have tried to explain to my colleagues why a parameter sent by reference can not be stored inside a closure as in the Eric Lippert blog entry about the stack as an implementation detail.
Can somebody provide me with a complete, but hopefully simple explanation about what references really are in C# and a bit about how they are imlemented?
Edit: this is not a duplicate, because in the Reference type in C# it is explained how a reference works and how is it different of a value, but what am I asking is how a reference is defined at a low level.

From what I understand by now, I can say that a reference in C# is a kind of pointer to an object
If by "kind of" you mean "is conceptually similar to", yes. If you mean "could be implemented by", yes. If you mean "has the is-a-kind-of relationship to", as in "a string is a kind of object" then no. The C# type system does not have a subtyping relationship between reference types and pointer types.
which has reference count
Implementations of the CLR are permitted to use reference counting semantics but are not required to do so, and most do not.
and knows about the type compatibility.
I'm not sure what this means. Objects know their own actual type. References have a static type which is compatible with the actual type in verifiable code. Compatibility checking is implemented by the runtime's verifier when the IL is analyzed.
My question is not about how a value type is different than a
reference type, but more about how a reference is implemented.
How references are implemented is, not surprisingly, an implementation detail.
Can somebody provide me with a complete, but hopefully simple explanation about what references really are in C#
References are things that act as references are specified to act by the C# language specification. That is:
objects (of reference type) have identity independent from the values of their fields
any object may have a reference to it
such a reference is a value which may be passed around like any other value
equality comparison is implemented for those values
two references are equal if and only if they refer to the same object; that is, references reify object identity
there is a unique null reference which refers to no object and is unequal to any valid reference to an object
A static type is always known for any reference value, including the null reference
If the reference is non-null then the static type of the reference is always compatible with the actual type of the referent. So for example, if we have a reference to a string, the static type of the reference could be string or object or IEnumerable, but it cannot be Giraffe. (Obviously if the reference is null then there is no referent to have a type.)
There are probably a few rules that I've missed, but that gets across the idea. References are anything that behaves like a reference. That's what you should be concentrating on. References are a useful abstraction because they are the abstraction which enables object identity independent of object value.
and a bit about how they are implemented?
In practice, objects of reference type in C# are implemented as blocks of memory which begin with a small header that contains information about the object, and references are implemented as pointers to that block. This simple scheme is then made more complicated by the fact that we have a multigenerational mark-and-sweep compacting collector; it must somehow know the graph of references so that it can move objects around in memory when compacting the heap, without losing track of referential identity.
As an exercise you might consider how you would implement such a scheme. It builds character to try to figure out how you would build a system where references are pointers and objects can move in memory. How would you do it?
it is hard for me to understand what really is a reference when I have tried to explain to my colleagues why a parameter sent by reference can not be stored inside a closure
This is tricky. It is important to understand that conceptually, a reference to a variable -- a ref parameter in C# -- and a reference to an object of reference type are conceptually similar but actually different things.
In C# you can think of a reference to a variable as an alias. That is, when you say
void M()
{
int x = 123;
N(ref x);
}
void N(ref int y)
{
y = 456;
Essentially what we are saying is that x and y are different names for the same variable. The ref is an unfortunate choice of syntax because it emphasizes the implementation detail -- that behind the scenes, y is a special "reference to variable" type -- and not the semantics of the operation, which is that logically y is now just another name for x; we have two names for the same variable.
References to variables and references to objects are not the same thing in C#; you can see this in the fact that they have different semantics. You can compare two references to objects for equality. But there is no way in C# to say:
static bool EqualAliases(ref int y, ref int z)
{
return true iff y and z are both aliases for the same variable
}
the way you can with references:
static bool EqualReferences(object x, object y)
{
return x == y;
}
Behind the scenes both references to variables and references to objects are implemented by pointers. The difference is that a reference to a variable might refer to a variable on the short-term storage pool (aka "the stack"), whereas a reference to an object is a pointer to the heap-allocated object header. That's why the CLR restricts you from storing a reference to a variable into long-term storage; it does not know if you are keeping a long-term reference to something that will be dead soon.
Your best bet to understand how both kinds of references are implemented as pointers is to take a step down from the C# type system into the CLI type system which underlies it. Chapter 8 of the CLI specification should prove interesting reading; it describes different kinds of managed pointers and what each is used for.

References in C# are very similar to C++ references. Yes, indeed, underneath there is garbage collection magic going on, but I would say how that works is a different and larger topic.
C# references are similar to C++ references/immutable pointers: No pointer arithmetic, etc - but you can reassign them (Thanks Ben!).
I'd say in practice, one difference is that since pointers aren't generally available in C# (unsafe keyword and its associated pointers is again a different and larger topic) , you'll find yourself using "out" keyword to do what pointer-to-pointer used to do.
Also you are correct in asserting references carry type information. All references in C# come from the Object class, which itself has GetType() method.
Be advised, however, structs - which are generally treated as value, not reference - also have GetType().

Related

why we cannot initialize an instance field at declaration in c# struct?

In c# -> struct, we cannot assign a value to instance field at declaration. Can you tell me the reason? Thanks.
A simple example:
struct Test
{
public int age =10; // it's not allowed.
}

I think the answer is very simple, but hard to get a grasp of if you do not know the difference between value types and reference types.
Maybe something to note is that reference type are held in the heap, which the garbage collect cleans. And a value type lives in the stack. Every time you define a scope, like:
{
}
A new local stack is created. Once you exit this scope, all value types on the stack are disposed unless a reference is held to them on the heap.
Seeing as reference types and value types are very differently handled, they are also designed with these changes in mind. Not being able to have empty constructors and also not being able to assign values on construction is a logical result of this.
I found a very old stackoverflow question regarding the same, they also have some short answers regarding it being designed like that for performance reasons:
Why can't I initialize my fields in my structs?
My source for this info was the ref book for 70-483.
Hope this gave you the clarification you are looking for

Are all objects that don't inherit from System.ValueType reference type?

Am I correct in believing that any object that doesn't inherit from System.ValueType must therefore by definition be a reference type?
I've been unable to find any conclusive documentation to backup this notion.

Check if this helps.

If you read closely the Remarks you'll see that
Data types are separated into value types and reference types. Value
types are either stack-allocated or allocated inline in a structure.
Reference types are heap-allocated. Both reference and value types are
derived from the ultimate base class Object. In cases where it is
necessary for a value type to behave like an object, a wrapper that
makes the value type look like a reference object is allocated on the
heap, and the value type's value is copied into it. The wrapper is
marked so the system knows that it contains a value type. This process
is known as boxing, and the reverse process is known as unboxing.
Boxing and unboxing allow any type to be treated as an object.
C# compiler does a wonderful job making you think that value types like int long has methods

If you were on QI and Stephen Fry was doing his normal thing of being what an impressionable idiot thinks a smart person is like, then he'd have reacted to "any object that doesn't inherit from System.ValueType must therefore by definition be a reference type" with a klaxon and a flashing screen saying "all objects are value types or reference types".
He'd then go on to point out that originally in computer science, object meant any entity that could be manipulated by a computer, and therefore includes pointers, which .NET has, even though they are don't fit the later definition of object (the word later said with a certain tone of condescension) that refers to objects that are encapsulated with their methods, and therefore you're wrong.
Alan Davies would point out that everyone knew what you meant, but it would be too late, your statement would have served only to fuel Fry's warm glow of smugness, especially since technology comes perhaps second only to Oscar Wilde in the ranks of things he likes to think he can talk intelligently about (and perhaps second to none in the ranks of things he knows nothing about, now I think of it, there's no way he'd manage to say the above and not get it wrong in some way).
In other words yes, you are completely right :)
(Apologies to those who haven't seen much British television, and therefore don't have a clue what any of that meant).

Why do we need struct? (C#)

To use a struct, we need to instantiate the struct and use it just like a class. Then why don't we just create a class in the first place?

A struct is a value type so if you create a copy, it will actually physically copy the data, whereas with a class it will only copy the reference to the data

A major difference between the semantics of class and struct is that structs have value semantics. What is this means is that if you have two variables of the same type, they each have their own copy of the data. Thus if a variable of a given value type is set equal to another (of the same type), operations on one will not affect the other (that is, assignment of value types creates a copy). This is in sharp contrast to reference types.
There are other differences:
Value types are implicitly sealed (it is not possible to derive from a value type).
Value types can not be null.
Value types are given a default constructor that initialzes the value type to its default value.
A variable of a value type is always a value of that type. Contrast this with classes where a variable of type A could refer to a instance of type B if B derives from A.
Because of the difference in semantics, it is inappropriate to refer to structs as "lightweight classes."

All of the reasons I see in other answers are interesting and can be useful, but if you want to read about why they are required (at least by the VM) and why it was a mistake for the JVM to not support them (user-defined value types), read Demystifying Magic: High-level Low-level Programming. As it stands, C# shines in talking about the potential to bring safe, managed code to systems programming. This is also one of the reasons I think the CLI is a superior platform [than the JVM] for mobile computing. A few other reasons are listed in the linked paper.
It's important to note that you'll very rarely, if ever, see an observable performance improvement from using a struct. The garbage collector is extremely fast, and in many cases will actually outperform the structs. When you add in the nuances of them, they're certainly not a first-choice tool. However, when you do need them and have profiler results or system-level constructs to prove it, they get the job done.
Edit: If you wanted an answer of why we need them as opposed to what they do, ^^^

In C#, a struct is a value type, unlike classes which are reference types. This leads to a huge difference in how they are handled, or how they are expected to be used.
You should probably read up on structs from a book. Structs in C# aren't close cousins of class like in C++ or Java.

This is a myth that struct are always created on heap.
Ok it is right that struct is value type and class is reference type. But remember that
1. A Reference Type always goes on the Heap.
2. Value Types go where they were declared.
Now what that second line means is I will explain with below example
Consider the following method
public void DoCalulation()
{
int num;
num=2;
}
Here num is a local variable so it will be created on stack.
Now consider the below example
public class TestClass
{
public int num;
}
public void DoCalulation()
{
TestClass myTestClass = new TestClass ();
myTestClass.num=2;
}
This time num is the num is created on heap.Ya in some cases value types perform more than reference types as they don't require garbage collection.
Also remeber:
The value of a value type is always a value of that type.
The value of a reference type is always a reference.
And you have to think over the issue that if you expect that there will lot be instantiation then that means more heap space yow will deal with ,and more is the work of garbage collector.For that case you can choose structs.

Structs have many different semantics to classes. The differences are many but the primary reasons for their existence are:
They can be explicitly layed out in memmory
this allows certain interop scenarios
They may be allocated on the stack
Making some sorts of high performance code possible in a much simpler fashion

the difference is that a struct is a value-type
I've found them useful in 2 situations
1) Interop - you can specify the memory layout of a struct, so you can guarantee that when you invoke an unmanaged call.
2) Performance - in some (very limited) cases, structs can be faster than classes, In general, this requires structs to be small (I've heard 16 bytes or less) , and not be changed often.

One of the main reasons is that, when used as local variables during a method call, structs are allocated on the stack.
Stack allocation is cheap, but the big difference is that de-allocation is also very cheap. In this situation, the garbage collector doesn't have to track structs -- they're removed when returning from the method that allocated them when the stack frame is popped.
edit - clarified my post re: Jon Skeet's comment.

A struct is a value type (like Int32), whereas a class is a reference type. Structs get created on the stack rather than the heap. Also, when a struct is passed to a method, a copy of the struct is passed, but when a class instance is passed, a reference is passed.
If you need to create your own datatype, say, then a struct is often a better choice than a class as you can use it just like the built-in value types in the .NET framework. There some good struct examples you can read here.

Are arrays or lists passed by default by reference in c#?

Do they? Or to speed up my program should I pass them by reference?

The reference is passed by value.
Arrays in .NET are object on the heap, so you have a reference. That reference is passed by value, meaning that changes to the contents of the array will be seen by the caller, but reassigning the array won't:
void Foo(int[] data) {
data[0] = 1; // caller sees this
}
void Bar(int[] data) {
data = new int[20]; // but not this
}
If you add the ref modifier, the reference is passed by reference - and the caller would see either change above.

They are passed by value (as are all parameters that are neither ref nor out), but the value is a reference to the object, so they are effectively passed by reference.

Yes, they are passed by reference by default in C#. All objects in C# are, except for value types. To be a little bit more precise, they're passed "by reference by value"; that is, the value of the variable that you see in your methods is a reference to the original object passed. This is a small semantic point, but one that can sometimes be important.

(1) No one explicitly answered the OP's question, so here goes:
No. Explicitly passing the array or list as a reference will not affect performance.
What the OP feared might be happening is avoided because the function is already operating on a reference (which was passed by value). The top answer nicely explains what this means, giving an Ikea way to answer the original question.
(2) Good advice for everyone:
Read Eric Lippert's advice on when/how to approach optimization. Premature optimization is the root of much evil.
(3) Important, not already mentioned:
Use cases that require passing anything - values or references - by reference are rare.
Doing so gives you extra ways to shoot yourself in the foot, which is why C# makes you use the "ref" keyword on the method call as well. Older (pre-Java) languages only made you indicate pass-by-reference on the method declaration. And this invited no end of problems. Java touts the fact that it doesn't let you do it at all.

What are the deficiencies of the Java/C# type system?

Its often hear that Haskell(which I don't know) has a very interesting type system.. I'm very familiar with Java and a little with C#, and sometimes it happens that I'm fighting the type system so some design accommodates or works better in a certain way.
That led me to wonder...
What are the problems that occur somehow because of deficiencies of Java/C# type system?
How do you deal with them?

Arrays are broken.
Object[] foo = new String[1];
foo[0] = new Integer(4);
Gives you java.lang.ArrayStoreException
You deal with them with caution.
Nullability is another big issue. NullPointerExceptions jump at your face everywhere. You really can't do anything about them except switch language, or use conventions of avoiding them as much as possible (initialize fields properly, etc).
More generally, the Java's/C#'s type systems are not very expressive. The most important thing Haskell can give you is that with its types you can enforce that functions don't have side effects. Having a compile time proof that parts of programs are just expressions that are evaluated makes programs much more reliable, composable, and easier to reason about. (Ignore the fact, that implementations of Haskell give you ways to bypass that).
Compare that to Java, where calling a method can do almost anything!
Also Haskell has pattern matching, which gives you different way of creating programs; you have data on which functions operate, often recursively. In pattern matching you destruct data to see of what kind it is, and behave according to it. e.g. You have a list, which is either empty, or head and tail. If you want to calculate the length, you define a function that says: if list is empty, length = 0, otherwise length = 1 + length(tail).
If you really like to learn more, there's two excellent online sources:
Learn you a Haskell and Real World Haskell

I dislike the fact that there is a differentiation between primitive (native) types (int, boolean, double) and their corresponding class-wrappers (Integer, Boolean, Double) in Java.
This is often quite annoying especially when writing generic code. Native types can't be genericized, you must instantiate a wrapper instead. Generics should make your code more abstract and easier reusable, but in Java they bring restrictions with obviously no reasons.
private static <T> T First(T arg[]) {
return arg[0];
}
public static void main(String[] args) {
int x[] = {1, 2, 3};
Integer y[] = {3, 4, 5};
First(x); // Wrong
First(y); // Fine
}
In .NET there are no such problems even though there are separate value and reference types, because they strictly realized "everything is an object".

this question about generics shows the deficiencies of the java type system's expressiveness
Higher-kinded generics in Java

I don't like the fact that classes are not first-class objects, and you can't do fancy things such as having a static method be part of an interface.

A fundamental weakness in the Java/.net type system is that it has no declarative means of specifying how an object's state relates to the contents of its reference-type fields, nor of specifying what a method is allowed to persist reference-type parameters. Although in some sense it's nice for the runtime to be able to use a field Foo of one type ICollection<integer> to mean many different things, it's not possible for the type system to provide real support for things like immutability, equivalency testing, cloning, or any other such features without knowing whether Foo represents:
A read-only reference to a collection which nothing will ever mutate; the class may freely share such reference with outside code, without affecting its semantics. The reference encapsulates only immutable state, and likely does not encapsulate identity.
A writable reference to a collection whose type is mutable, but which nothing will ever actually mutate; the class may only share such references with code that can be trusted not to mutate it. As above, the reference encapsulates only immutable state, and likely does not encapsulate identity.
The only reference anywhere in the universe to a collection which it mutates. The reference would encapsulate mutable state, but would not encapsulate identity (replacing the collection with another holding the same items would not change the state of the enclosing object).
A reference to a collection which it mutates, and whose contents it considers to be its own, but to which outside code holds references which it expects to be attached to `Foo`'s current state. The reference would encapsulate both identity and mutable state.
A reference to a mutable collection owned by some other object, which it expects to be attached to that other object's state (e.g. if the object holding `Foo` is supposed to display the contents of some other collection). That reference would encapsulate identity, but would not encapsulate mutable state.
Suppose one wants to copy the state of the object that contains Foo to a new, detached, object. If Foo represents #1 or #2, one may store in the new object either a copy of the reference in Foo, or a reference to a new object holding the same data; copying the reference would be faster, but both operations would be correct. If Foo represents #3, a correct detached copy must hold a reference to a new detached object whose state is copied from the original. If Foo represents #5, a correct detached copy must hold a copy of the original reference--it must NOT hold reference to a new detached object. And if Foo represents #4, the state of the object containing it cannot be copied in isolation; it might be possible to copy a bunch of interconnected objects to yield a new bunch whose state is equivalent to the original, but it would not be possible to copy the state of objects individually.
While it won't be possible for a type system to specify declaratively all of the possible relationships that can exist among objects and what should be done about them, it should be possible for a type system and framework to correctly generate code to produce semantically-correct equivalence tests, cloning methods, smoothly inter-operable mutable, immutable, and "readable" types, etc. in most cases, if it knew which fields encapsulate identity, mutable state, both, or neither. Additionally, it should be possible for a framework to minimize defensive copying and wrapping in circumstances where it could ensure that the passed references would not be given to anything that would mutate them.

(Re: C# specifically.)
I would love tagged unions.
Ditto on first-class objects for classes, methods, properties, etc.
Although I've never used them, Python has type classes that basically are the types that represent classes and how they behave.
Non-nullable reference types so null-checks are not needed. It was originally considered for C# but was discarded. (There is a stack overflow question on this.)
Covariance so I can cast a List<string> to a List<object>.

This is minor, but for the current versions of Java and C# declaring objects breaks the DRY principle:
Object foo = new Object;
Int x = new Int;

None of them have meta-programming facilities like say that old darn C++ dog has.
Using "using" duplication and lack of typedef is one example that violates DRY and can even cause user-induced 'aliasing' errors and more. Java 'templates' isn't even worth mentioning..

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.