Why is my object a pointer? - c#

My question concerns the use of objects in C#. I think I understand what's happening, but I want to understand why. For reasons I won't go into, I want to create a temporary copy of an object with its current data (current state). So I thought I could create a new object, assign it the original object, then change the original object. At that point I would have two objects in different states. But what happens is that the copied object ends up looking exactly like the first. Here is some code to illustrate:
Order o1 = new Order();
o1.property1 = "test 1";
Order o2 = new Order();
o2 = o1;
o1.property1 = "test 2";
But at the end of this code, both o1 and o2 have property1 set to "test 2". I think I realize that all objects are just pointers, so if you change one it changes another, but I can't understand why this is, or why it is useful. Is there some fundamental thing I'm missing here? Also, what would be the best way to accomplish what I want to do? Which is: store the state of the object, make changes, then revert if necessary. Hopefully this makes sense.

An object variable in C# is a reference (not a pointer) to a specific object in memory. When you declare
Order o2 = new Order();
you are creating a new Order object in the heap, and allocating a reference to that object to your o2 variable. When you then state
o2 = o1;
you are telling the compiler to make o2 a reference to o1. At this point, the reference to the original o2 object is lost, and the memory for that object will be removed during the next garbage collection sweep.
Henceforth, both o1 and o2 both reference the same object in memory. To copy information from one object to another, you will need to implement a procedure to instantiate a new destination object and copy all of the data from one object to the other. See the MSDN docs on ICloneable for more info.

What you are referring to is the difference between value types and reference types. Apparently your Order object is a reference type, I would assume it is a class.
Classes are reference types meaning they are "pointers". One of the reasons for this is performance as you do not want to copy huge amounts of data every time you assign a variable.
Structures are value types and would be copied in memory when you assign them.
You have 2 solutions :
Use a struct instead of class
Clone your object using either MemberwiseClone if it is very simple, or use your own method if you need to perform a deep clone.

This is by Design. If you want to clone and keep the clone independent i would recommend to Implement a "cloning" mechanism on your types. This can be ICloneable or even just a constructor that takes an instance and copies values from it.

Regarding your question
what would be the best way to accomplish what I want to do? Which is:
store the state of the object, make changes, then revert if necessary
A simple method is to simply serialize the object, e.g. using XMLSerializer. Then if you want to throw away your changes, just deserialize the original object and replace the modified object with the original version.

Use Structures to accomplish your task, Classes are reference type and Structs are Value type.
Classes are stored on memory heap
Structs are stored on stack.
for more info search Structs vs Classes and learn differences

Objects are, by definition, a 'pointer'; they hold a reference to your data, and not the actual data itself. You can assign it a value type though and it will give the appearance of holding the data.
As was mentioned above, understanding Value types vs. Reference types is key.

Java has no concept of any non-primitive data type other than an object reference; since almost anything one can do with an object reference involves acting upon the object referred to thereby, the . operator in Java . Although .net does have non-primitive value types, most .net languages maintain the convention (different from C and C++, which use -> to access a member of a pointed-to object and . to access a member of a structure) that the same . operator is used for both "dereference and access member" and "access value-type member".
Personally, I dislike Java's "everything is an object reference" design, and .net's decision to have value types and reference types use the same . operator to mean very different things doesn't help, but it is what it is.

Related

C# Object name concept explanation

With a dictionary with a nested class, for example: Dictionary<int, BankAccount>,
what's the difference between creating the class first as an object, then linking it to a new Dictionary, and creating the object directly into the Dictionary itself, for example:
dict.Add(1, new BankAccount());
var acc = new BankAccount();
dict.Add(1, acc);
Is there any benefit of using one over another?
The advantage of creating the object first, and adding it by reference is, that you hold the reference in the current method, and thus have full access to it.
If you create the object in line with the add method, you would have to fetch the object from the dictionary to gain access.
I do not see any other differences.
Creating the object first, could have code-maintainability benefits, when you find out later that the object needs to be modified.
The only real difference I could imagine is if you use the first option, the garbage collector doesn't have to hold onto a variable reference and can release the memory sooner. Other than that, it is more concise to choose the first option. Functionally, your options accomplish the same task.

What exactly is a reference in C#

From what I understand by now, I can say that a reference in C# is a kind of pointer to an object which has reference count and knows about the type compatibility. My question is not about how a value type is different than a reference type, but more about how a reference is implemented.
I have read this post about what differences are between references and pointers, but that does not cover that much about what a reference is but it it's describing more it's properties compared with a pointer in C++. I also understand the differences between passing by reference an passing by value (as in C# objects are by default passed by value, even references), but it is hard for me to understand what really is a reference when I have tried to explain to my colleagues why a parameter sent by reference can not be stored inside a closure as in the Eric Lippert blog entry about the stack as an implementation detail.
Can somebody provide me with a complete, but hopefully simple explanation about what references really are in C# and a bit about how they are imlemented?
Edit: this is not a duplicate, because in the Reference type in C# it is explained how a reference works and how is it different of a value, but what am I asking is how a reference is defined at a low level.
From what I understand by now, I can say that a reference in C# is a kind of pointer to an object
If by "kind of" you mean "is conceptually similar to", yes. If you mean "could be implemented by", yes. If you mean "has the is-a-kind-of relationship to", as in "a string is a kind of object" then no. The C# type system does not have a subtyping relationship between reference types and pointer types.
which has reference count
Implementations of the CLR are permitted to use reference counting semantics but are not required to do so, and most do not.
and knows about the type compatibility.
I'm not sure what this means. Objects know their own actual type. References have a static type which is compatible with the actual type in verifiable code. Compatibility checking is implemented by the runtime's verifier when the IL is analyzed.
My question is not about how a value type is different than a
reference type, but more about how a reference is implemented.
How references are implemented is, not surprisingly, an implementation detail.
Can somebody provide me with a complete, but hopefully simple explanation about what references really are in C#
References are things that act as references are specified to act by the C# language specification. That is:
objects (of reference type) have identity independent from the values of their fields
any object may have a reference to it
such a reference is a value which may be passed around like any other value
equality comparison is implemented for those values
two references are equal if and only if they refer to the same object; that is, references reify object identity
there is a unique null reference which refers to no object and is unequal to any valid reference to an object
A static type is always known for any reference value, including the null reference
If the reference is non-null then the static type of the reference is always compatible with the actual type of the referent. So for example, if we have a reference to a string, the static type of the reference could be string or object or IEnumerable, but it cannot be Giraffe. (Obviously if the reference is null then there is no referent to have a type.)
There are probably a few rules that I've missed, but that gets across the idea. References are anything that behaves like a reference. That's what you should be concentrating on. References are a useful abstraction because they are the abstraction which enables object identity independent of object value.
and a bit about how they are implemented?
In practice, objects of reference type in C# are implemented as blocks of memory which begin with a small header that contains information about the object, and references are implemented as pointers to that block. This simple scheme is then made more complicated by the fact that we have a multigenerational mark-and-sweep compacting collector; it must somehow know the graph of references so that it can move objects around in memory when compacting the heap, without losing track of referential identity.
As an exercise you might consider how you would implement such a scheme. It builds character to try to figure out how you would build a system where references are pointers and objects can move in memory. How would you do it?
it is hard for me to understand what really is a reference when I have tried to explain to my colleagues why a parameter sent by reference can not be stored inside a closure
This is tricky. It is important to understand that conceptually, a reference to a variable -- a ref parameter in C# -- and a reference to an object of reference type are conceptually similar but actually different things.
In C# you can think of a reference to a variable as an alias. That is, when you say
void M()
{
int x = 123;
N(ref x);
}
void N(ref int y)
{
y = 456;
Essentially what we are saying is that x and y are different names for the same variable. The ref is an unfortunate choice of syntax because it emphasizes the implementation detail -- that behind the scenes, y is a special "reference to variable" type -- and not the semantics of the operation, which is that logically y is now just another name for x; we have two names for the same variable.
References to variables and references to objects are not the same thing in C#; you can see this in the fact that they have different semantics. You can compare two references to objects for equality. But there is no way in C# to say:
static bool EqualAliases(ref int y, ref int z)
{
return true iff y and z are both aliases for the same variable
}
the way you can with references:
static bool EqualReferences(object x, object y)
{
return x == y;
}
Behind the scenes both references to variables and references to objects are implemented by pointers. The difference is that a reference to a variable might refer to a variable on the short-term storage pool (aka "the stack"), whereas a reference to an object is a pointer to the heap-allocated object header. That's why the CLR restricts you from storing a reference to a variable into long-term storage; it does not know if you are keeping a long-term reference to something that will be dead soon.
Your best bet to understand how both kinds of references are implemented as pointers is to take a step down from the C# type system into the CLI type system which underlies it. Chapter 8 of the CLI specification should prove interesting reading; it describes different kinds of managed pointers and what each is used for.
References in C# are very similar to C++ references. Yes, indeed, underneath there is garbage collection magic going on, but I would say how that works is a different and larger topic.
C# references are similar to C++ references/immutable pointers: No pointer arithmetic, etc - but you can reassign them (Thanks Ben!).
I'd say in practice, one difference is that since pointers aren't generally available in C# (unsafe keyword and its associated pointers is again a different and larger topic) , you'll find yourself using "out" keyword to do what pointer-to-pointer used to do.
Also you are correct in asserting references carry type information. All references in C# come from the Object class, which itself has GetType() method.
Be advised, however, structs - which are generally treated as value, not reference - also have GetType().

How to define an instance?

I was asked a question in an interview and i wasn't able to answer it... Here is the question
How will you define an instance[c#]?
My answer was it is an other name of an object... what is the right answer for this question...
Instance is to class as cake is to recipe. Any time you use a constructor to create an object, you are creating an instance.
MyObject obj = new MyObject( );
I would describe instance as a single copy of an object. There might be one, there might be thousands, but an instance is a specific copy, to which you can have a reference.
Class is the blueprint, instance is the completed construction.
An "instance" is an object allocated in memory, usually initialized by the compiler directive 'new, rendered according to the structure of a template which is most often a built-in language-feature (like a native data structure : a Dictionary, List, etc.), or a built-in .NET class (like a WinForm ?), or a user-defined class, or struct in .NET; or, even an Enum.
While an "instance" of a "class," for example, will embody, or contain, all the properties, fields, and methods of the class, the fields and/or properties may, or may not, have values allocated to them when the "instance" is created. The class template will also constrain the accessibility of the properties, fields, and methods inside any instance of the class.
The instance is "the real something" created from some "abstract plan for the something."
I would have rather taken a real life example...
stating that "car" is a class, so if i tell you i have a car you will have no clue what kind of car it is. But if tell you that i have Ford Fiesta, 1.6 EXI 2009 model of silver color, then you exactly know my car. So, this is what an instance is.
Instances and objects are same if we consider only classes but different if we consider the whole C#. Instance is more general than object.
Anything which occupy space or memory and build by following some blue print is an instance of that blue print.
An object is denotes the reference to a memory location assigned by following memory requirements of a class;
Example:
They are same
An object is an instance of a class.
var John = new Person();
We get object John by assigning it new Person(). Here new Person() first reserves total memory required for storing its value type properties & its references and then assign default values to its properties.
So this 'reserved memory with default value' is named 'John' which is an INSTANCE of a class and in OOPs is called OBJECT.
They are different
A variable is an instance of its type.
int x = 5;
Here everything is same. x is a name of memory location which is exactly 4 byte in capacity to store an integer. What is different is here x is an INSTANCE of an int but not an object.
Instance is synonymous of object and when we create an object of class then we say that we are creating instance of class
in simple word instance means creating reference of object(copy of object at particular time)
and object refer to memory address of class
yup, my interpreteation would be to mention that only classes can 'define' instances. or something along those lines, I might mention an example in code, or seek clarification of the question.
a class is akin to a blueprint while an instance is a concrete implementation of the class/blueprint. An instance is also characterized by its identity, state and behavior.

What are the deficiencies of the Java/C# type system?

Its often hear that Haskell(which I don't know) has a very interesting type system.. I'm very familiar with Java and a little with C#, and sometimes it happens that I'm fighting the type system so some design accommodates or works better in a certain way.
That led me to wonder...
What are the problems that occur somehow because of deficiencies of Java/C# type system?
How do you deal with them?
Arrays are broken.
Object[] foo = new String[1];
foo[0] = new Integer(4);
Gives you java.lang.ArrayStoreException
You deal with them with caution.
Nullability is another big issue. NullPointerExceptions jump at your face everywhere. You really can't do anything about them except switch language, or use conventions of avoiding them as much as possible (initialize fields properly, etc).
More generally, the Java's/C#'s type systems are not very expressive. The most important thing Haskell can give you is that with its types you can enforce that functions don't have side effects. Having a compile time proof that parts of programs are just expressions that are evaluated makes programs much more reliable, composable, and easier to reason about. (Ignore the fact, that implementations of Haskell give you ways to bypass that).
Compare that to Java, where calling a method can do almost anything!
Also Haskell has pattern matching, which gives you different way of creating programs; you have data on which functions operate, often recursively. In pattern matching you destruct data to see of what kind it is, and behave according to it. e.g. You have a list, which is either empty, or head and tail. If you want to calculate the length, you define a function that says: if list is empty, length = 0, otherwise length = 1 + length(tail).
If you really like to learn more, there's two excellent online sources:
Learn you a Haskell and Real World Haskell
I dislike the fact that there is a differentiation between primitive (native) types (int, boolean, double) and their corresponding class-wrappers (Integer, Boolean, Double) in Java.
This is often quite annoying especially when writing generic code. Native types can't be genericized, you must instantiate a wrapper instead. Generics should make your code more abstract and easier reusable, but in Java they bring restrictions with obviously no reasons.
private static <T> T First(T arg[]) {
return arg[0];
}
public static void main(String[] args) {
int x[] = {1, 2, 3};
Integer y[] = {3, 4, 5};
First(x); // Wrong
First(y); // Fine
}
In .NET there are no such problems even though there are separate value and reference types, because they strictly realized "everything is an object".
this question about generics shows the deficiencies of the java type system's expressiveness
Higher-kinded generics in Java
I don't like the fact that classes are not first-class objects, and you can't do fancy things such as having a static method be part of an interface.
A fundamental weakness in the Java/.net type system is that it has no declarative means of specifying how an object's state relates to the contents of its reference-type fields, nor of specifying what a method is allowed to persist reference-type parameters. Although in some sense it's nice for the runtime to be able to use a field Foo of one type ICollection<integer> to mean many different things, it's not possible for the type system to provide real support for things like immutability, equivalency testing, cloning, or any other such features without knowing whether Foo represents:
A read-only reference to a collection which nothing will ever mutate; the class may freely share such reference with outside code, without affecting its semantics. The reference encapsulates only immutable state, and likely does not encapsulate identity.
A writable reference to a collection whose type is mutable, but which nothing will ever actually mutate; the class may only share such references with code that can be trusted not to mutate it. As above, the reference encapsulates only immutable state, and likely does not encapsulate identity.
The only reference anywhere in the universe to a collection which it mutates. The reference would encapsulate mutable state, but would not encapsulate identity (replacing the collection with another holding the same items would not change the state of the enclosing object).
A reference to a collection which it mutates, and whose contents it considers to be its own, but to which outside code holds references which it expects to be attached to `Foo`'s current state. The reference would encapsulate both identity and mutable state.
A reference to a mutable collection owned by some other object, which it expects to be attached to that other object's state (e.g. if the object holding `Foo` is supposed to display the contents of some other collection). That reference would encapsulate identity, but would not encapsulate mutable state.
Suppose one wants to copy the state of the object that contains Foo to a new, detached, object. If Foo represents #1 or #2, one may store in the new object either a copy of the reference in Foo, or a reference to a new object holding the same data; copying the reference would be faster, but both operations would be correct. If Foo represents #3, a correct detached copy must hold a reference to a new detached object whose state is copied from the original. If Foo represents #5, a correct detached copy must hold a copy of the original reference--it must NOT hold reference to a new detached object. And if Foo represents #4, the state of the object containing it cannot be copied in isolation; it might be possible to copy a bunch of interconnected objects to yield a new bunch whose state is equivalent to the original, but it would not be possible to copy the state of objects individually.
While it won't be possible for a type system to specify declaratively all of the possible relationships that can exist among objects and what should be done about them, it should be possible for a type system and framework to correctly generate code to produce semantically-correct equivalence tests, cloning methods, smoothly inter-operable mutable, immutable, and "readable" types, etc. in most cases, if it knew which fields encapsulate identity, mutable state, both, or neither. Additionally, it should be possible for a framework to minimize defensive copying and wrapping in circumstances where it could ensure that the passed references would not be given to anything that would mutate them.
(Re: C# specifically.)
I would love tagged unions.
Ditto on first-class objects for classes, methods, properties, etc.
Although I've never used them, Python has type classes that basically are the types that represent classes and how they behave.
Non-nullable reference types so null-checks are not needed. It was originally considered for C# but was discarded. (There is a stack overflow question on this.)
Covariance so I can cast a List<string> to a List<object>.
This is minor, but for the current versions of Java and C# declaring objects breaks the DRY principle:
Object foo = new Object;
Int x = new Int;
None of them have meta-programming facilities like say that old darn C++ dog has.
Using "using" duplication and lack of typedef is one example that violates DRY and can even cause user-induced 'aliasing' errors and more. Java 'templates' isn't even worth mentioning..

Object Copy simple question?

If copying an object just create a new reference to the same object in memory then i don't understand why it is useful, because it only creates another name for the same object.
Copy, means for me, creating a clone of the object in another memory location.
Then i could manipulate 2 separate objects which are the same only at the moment of their copy but whom their live will be different.
I use C#.
Can someone explain me...
Thanks
John
Copying usually means actually creating a new object. However, the new object may be a shallow copy, so it may not actually hold references to new copy of the fields.
It's possible that the class you are looking at is Immutable, and the class designer decided that there was no need for the memory overhead.
Copying by reference is useful behaviour when you want to "pass around" an object to many components, either to allow many components to modify the state of the single object or to allow the functionality of the object to be used by multiple components.
Additionally, passing by reference avoids copying values, which can often produce a smaller memory footprint for an application.
If you wish, you can implement a Clone method on an object which will perform the behaviour you're asking for, allowing you to have a separate object to work with.
Lastly, if the behaviour of passing by reference doesn't seem natural for your object (for example your object is a fundamental value such as coordinate data), you can create a struct instead of a class. A struct or "structure" is copied by value, so when you pass it to a method, the entire object is copied and the copy passed to the method.
there are 3 kinds of copy
reference copy :giving another name to the object
shallow Copy : will create another copy of the object skeleton without the inner data
deep copy : will create another copy of the object and the data
you can read more about object copy in this link
http://en.wikipedia.org/wiki/Object_copy
You are right in your understanding that there are two, (actually three if you consider deep vs shallow copies) ways to reproduce a reference object.
You can copy the variables address into another variable (Same object on the Heap, now with another reference to it), or
You can create a new object on the heap and copy the values of the original objects properties and fields into the new object. This is generally called a Clone, and can be done in two ways Shallow or Deep.
Shallow Copy. Here you only copy primitives, and, where the object has properties which reference other reference types, only copy the reference, (i.e., the address), this is called a shallow copy, or,
Deep Copy. Here you copy primitives, and you can create new objects for each property which references another reference type.
You are right that copying creates a new object. I think the misconception comes from thinking of objects like primitives. Copying a primitive value and copying an object is done in different ways.
int x = 5;
int y = x;
y is a copy of x.
Object a = new object();
Object b = a;
b is a reference to a rather than a copy of a. To copy a you do need to write specific code to clone the object yourself.
I believe someone else will complain if Microsoft chooses implementing it in your way. It depends on the context that you using it to say which way is better. It's wise to take more efficient way as the default implementation.
Also, reference type is kind of like a pointer, so it makes sense to just copy the "pointer" itself in this case.
If you find this behavior is not what you desired, you can use your own implementation as well.

Categories

Resources