I am reading up on c# arrays so my question is initially on arrays.
What does declaring an array actually mean? I know you declare a variable of type array. When I have the following, what is actually happening?
int[] values;
Is it in memory by the time it is declared? If not then where is it? Is the array actually created here?
Then I go and instantiate an the array and initialise it with some values like:
int[] values = new int[] { 1, 2, 3 };
Does this actually go and create the array now? I have read that arrays are created when they are declared, others say that arrays are created when they are instantiated. I am trying to get my terminology right.
The same goes for an integer variable. If I have:
int value;
and
int value = 1;
When is int created? When is it added to memory?
Sorry for the dumb questions. I understand the concept but would like to know the technicallity behind the scenes of arrays.
What does declaring an array actually mean?
You didn't actually declare an array, you declared an array reference. Big deal in .NET, the difference between reference types and value types is important. Just having the array reference variable isn't enough, an extra step is required to create the array object. Which requires the new keyword. Which physically allocates the storage for the array object in the place where reference type objects are stored, the garbage collected heap.
The same goes for an integer variable
No, big difference. That's a value type. If it isn't a field of a class, not that clear from your question, then it is a local variable of a method. It gets created when the method starts running and poofs out of existence when the method returns. Very highly optimized, the core reason that value types exist in C#. The physical storage location is typically a cpu register or a slot on the stack frame if the method uses too many local variables.
If it is actually a member of a class then it gets created when the class object gets created. Just like an array, on the GC heap with the new keyword.
When you declare it like this:
int[] values;
you don't specify the size, so there is no way to know how much memory would be needed for an instatiation. This information is only given in the following line:
values = new int[] { 1, 2, 3 };
The memory requirements are deduced from the number of instatiation values (and from the memory requirements of the type int, of course).
When you declare an int like this:
int value;
the memory requirements are known and cannot change (since int is a value type). This variable can (and will) be created immediately. If you don't specify an initial value, it will have it's default value, which for int is 0.
int[] values;
Means that you declare a variable of type int[]. No memory is occupied yet, only a reference is created. The code above is initialized to a null-reference.
int[] values = new int[] { 1, 2, 3 };
This code declares a variable of type int[], and immediately creates an array. The variable references the newly created array.
Integers work a little different since they are value-types. Value types are initialized to their default values, in case of integers, the value 0.
If you split the declaration and the initialization, the following happens.
// This declares a variable
int[] values;
// This creates the array, and initializes the variable with the newly created array.
values = new int[] { 1, 2, 3 };
When you declare an array, internally all that is being created is a null pointer that is of type int[]. When you use the new keyword as in your example, or you use new int[6], at that time the system allocates memory for the size of the array.
Declaring an int will actually create the memory for the integer with default value of 0.
Related
Example:
// Potentially large struct.
struct Foo
{
public int A;
public int B;
// etc.
}
Foo[] arr = new Foo[100];
If Foo is a 100 byte structure, how many bytes will be copied in memory during execution of the following statement:
int x = arr[0].A
That is, is arr[0] evaluated to some temporary variable (a 100 byte copy of an instance of Foo), followed by the copying of .A into variable x (a 4 byte copy).
Or is some combination of the compiler, JITer and CLR able to optimise this statement such that the 4 bytes of A are copied directly into x.
If an optimisation is performed, does it still hold when the items are held in a List<Foo> or when an array is passed as an IList<Foo> or an ArraySegment<Foo>?
Value types are copied by value -- hence the name. So then we must consider at what times a copy must be made of a value. This comes down to analyzing correctly when a particular entity refers to a variable, or a value. If it refers to a value then that value was copied from somewhere. If it refers to a variable then its just a variable, and can be treated like any other variable.
Suppose we have
struct Foo { public int A; public int B; }
Ignore for the moment the design flaws here; public fields are a bad code smell, as are mutable structs.
If you say
Foo f = new Foo();
what happens? The spec says:
A new eight byte variable f is created.
A temporary eight byte storage location temp is created.
temp is filled in with eight bytes of zeros.
temp is copied to f.
But that is not what actually happens; the compiler and runtime are smart enough to notice that there is no observable difference between the required workflow and the workflow "create f and fill it with zeros", so that happens. This is a copy elision optimization.
EXERCISE: devise a program in which the compiler cannot copy-elide, and the output makes it clear that the compiler does not perform a copy elision when initializing a variable of struct type.
Now if you say
f.A = 123;
then f is evaluated to produce a variable -- not a value -- and then from that A is evaluated to produce a variable, and four bytes are written to that variable.
If you say
int x = f.A;
then f is evaluated as a variable, A is evaluated as a variable, and the value of A is written to x.
If you say
Foo[] fs = new Foo[1];
then variable fs is allocated, the array is allocated and initialized with zeros, and the reference to the array is copied to fs. When you say
fs[0].A = 123;
Same as before. f[0] is evaluated as a variable, so A is a variable, so 123 is copied to that variable.
When you say
int x = fs[0].A;
same as before: we evaluate fs[0] as a variable, fetch from that variable the value of A, and copy it.
But if you say
List<Foo> list = new List<Foo>();
list.Add(new Foo());
list[0].A = 123;
then you will get a compiler error, because list[0] is a value, not a variable. You can't change it.
If you say
int x = list[0].A;
then list[0] is evaluated as a value -- a copy of the value stored in the list is made -- and then a copy of A is made in x. So there is an extra copy here.
EXERCISE: Write a program that illustrates that list[0] is a copy of the value stored in the list.
It is for this reason that you should (1) not make big structs, and (2) make them immutable. Structs get copied by value, which can be expensive, and values are not variables, so it is hard to mutate them.
What makes array indexer return a variable but list indexer not? Is array treated in a special way?
Yes. Arrays are very special types that are built deeply into the runtime and have been since version 1.
The key feature here is that an array indexer logically produces an alias to the variable contained in the array; that alias can then be used as the variable itself.
All other indexers are actually pairs of get/set methods, where the get returns a value, not a variable.
Can I create my own class to behave the same as array in this regard
Before C# 7, not in C#. You could do it in IL, but of course then C# wouldn't know what to do with the returned alias.
C# 7 adds the ability for methods to return aliases to variables: ref returns. Remember, ref (and out) parameters take variables as their operands and cause the callee to have an alias to that variable. C# 7 adds the ability to do this to locals and returns as well.
The entire struct is already in memory. When you access arr[0].A, you aren't copying anything, and no new memory is needed. You're looking up an object reference (that might be on the call stack, but a struct might be wrapped by a reference type on the heap, too) for the location of arr[0], adjusting for the offset for the A property, and then accessing only that integer. There will not be a need to read the full struct just to get A.
Neither List<Foo> or ArraySegment<Foo> really changes anything important here so far.
However, if you were to pass arr[0] to a function or assign it to a new variable, that would result in copying the Foo object. This is one difference between a struct (value type) and a class (reference type) in .Net; a class would only copy the reference, and List<Foo> and ArraySegment<Foo> are both reference types.
In .Net, especially as a newcomer the platform, you should strongly prefer class over struct most of the time, and it's not just about the copying the full object vs copying the reference. There are some other subtle semantic differences that even I admittedly don't fully understand. Just remember that class > struct until you have a good empirical reason to change your mind.
Is it possible to fully remove Array in C# but not to fill it with 0's:
for(int i=0;i<a.Length;i++)
{
a[i]=0;
}
or Array.Clear(a,0,a.Length);
But to clear it in a way that List.Clear() does so that it's size will be 0 again like before filling.
I tried
a=new int[15]; but prevous values where still there. Thanks!
Arrays in C# are fixed-length; you cannot change the size of an array. You can allocate an array of a different size and copy the elements in order to simulate resizing (this is exactly what List<T> does internally), but you cannot "clear an array" in the sense that you reduce it to zero elements.
I tried a=new int[15]; but prevous values where still there.
The previous values cannot possibly still be there, because this allocates a new int array of 15 elements, where all elements are zero.
Note that this does not alter the array that a referenced; rather, it creates a new array and stores a reference to it in a. So if you initialized a from another array variable, they would have referred to the same array, but after assigning a new array to a the other variable would continue to point to the old array. Perhaps this is where the "previous values" are coming from.
var a = new int[] { 1, 2, 3 };
var b = a;
// a and b now reference the same array.
a = new int[] { 4, 5, 6 };
// a is now {4,5,6} but b remains {1,2,3}
As others have said, it depends on the type semantics that you're putting into the array.
Value types (such as int, bool, and float) are ... well, values. They represent a quantity, something tangible, a state. Thus, they are required to be known at compile time and have a default value.
By contrast, reference types (basically every class) don't actually hold any values themselves, but "group" data together by means of reference. Reference types will either point to other reference types, or eventually to a value type (which holds actual data).
This distinction is important to your question. List<T> is a dynamically sized collection that can grow or shrink without creating a new object because of how it is implemented. Each element in the list points to the next element, thus it's size cannot be known ahead of time.
Arrays are a fixed-size collection that are declared to be a specific size. The type of array determines how much memory is reserved by the system. For example a byte[] of 100 elements will consume less memory than an Int64[] array of 100 elements. Thus, the system needs to know ahead of time how many bytes to reserve in total, which means it needs a default value to "fall back" on to satisfy compile-time checking. Where T[] is a reference type/class, this is null. For value types, this is usually 0 (or default(T)).
If you wanted to remove all the values of an array, similar to how List.Clear() works, you can do int[] a = new int[0];, but note that you are creating an entirely new array and reallocating the memory for them (hence the keyword new). Other objects will need to reference this new array. By design, you can't simply resize an array. A list is a mutable collection and supports changing the number of elements. You could also try int[] a = null, but this sets it to no object at all, which is again, something different.
It depends whether the array's elements are Value type or Reference type.
In your case it is value type so you'll have to have some value in it. You can not assign null to it.
Because value type objects have some default values.
Following is the code I wrote
Calc[] calculators = new Calc[10];
calculators[0].AddToSum(10); (the corresponding classes and methods are written).
But I got "Object reference not set to an instance of an object" exception.Then with some research I got the exception removed by doing following.
for (int i = 0; i < 10; i++)
{
calculators[i] = new Calc();
}
Can somebody explain why we need to allocate memory again unlike in c/c++.
This is how I did it in c++:
Calculator *calc=new Calculator[10]//I know I need to check for std::bad_alloc exception
calculators[0].AddToSum(10);
delete[] calc;
In C#, there are reference types, and there are value types. Classes are reference types. When you create a variable of a reference type, you are creating a reference, not an object. The default state of a reference is null. If you want it to refer to an object, you have to explicitly initialize it with new, or assign if from another initialized reference.
C++ does not have this distinction. Every type is a value type (though you can also create references to any type). When you create a variable of a value type, you are creating an object.
in new Calc[10] you are allocating and sizing the array. in new Calc() you are creating the actual Calc objects
But you would get that same error with this statement
Calc calc;
calc.AddToSum(10);
Object is null until you you assign a value.
Calc[] calculators = new Calc[10]; does not allocate.
Based on the answer from Benjamin (+1) it works if Calc is a reference type.
Can you just make Calc a struct?
I don't think you allocate the memory again, but you still need to instantiate some value for calculators[0].
In your first code-segment, your are trying to call .AddToSum on a value that is Null.
Ps: You could do the following instead, to initialize each Calc from the start:
Calc[] calculators = new Calc[10]{
new Calc(),
new Calc(),
...,
// Repeat 10 times to match array length
};
Update: In response to the comments below; Ok, try this then:
calc[] calculators = Enumerable.Repeat(new Calc(), 127).ToArray<Calc>();
When you create an array of objects in c++ you allocate memory for all the fields of each object. So if your objects have two integer fields and you make an array of size two, enough memory is allocated to hold four integers.
On the other hand in c# when you make an array of objects you are creating and array of references (pointers to objects). So you cannot store an instance unless you allocate memory for each reference (by using new).
The same thing in c++ would be making an array of pointers, and then you'll have to instantiate each element of your array.
Your C++ code is also wrong.
In C++ you've allocated an array with space for 10 Calculator objects.
When you do the operation, it's reading from that (uninitialized) memory, grabbing a value, and adding to it, then writing that back out.
But you've got an uninitialized object to start from.
It likely works in C++ because you have an object (Calculator) that doesn't require the constructor to be called. If it had any initialization that required the constructor to be called, it wouldn't work. If you were to use a debugger and put a breakpoint in Calculator constructor, you'll see it's never called.
Anyway, to directly answer the question, this is the way C# works. Allocating an array creates space for the array, but all objects within the array (assuming object types) are null until themselves allocated.
Think of it this way: I create an array to hold 10 objects of Class X. But X has a constructor that takes a string, and I want to call it with a different string for each of those objects. How would one do so without explicitly creating each of those 10 objects and passing the right string to each constructor?
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How are Integer arrays stored internally, in the JVM?
In C#, when you are creating a new array which is a reference type so it puts a pointer onto Stack and object itself in Heap. If you create this array with simple primitive types such as int, double, etc. what it does is to put the values exactly where they are placed in Heap instead of a pointer which points at another heap address where the content of it stored.
So can someone please explain how this happens in Java? Java use Integer (a reference type) in arrays all the time or treats value types as C# does?
int[] hello = new int[5];
hello[0] = 2; // C# put this value directly in same slot and doesn't
//create a wrapping object.
I know a thing which is called Wrapping Types in Java which C# doesn't have. C# has auto-boxing but Int32 lets say not an reference type, but ValueType where as Integer is an object as opposed to int. You can either box a value using Object o = 5; or if struct does have a parent class, you can use it too to wrap it up in heap (boxing).
Java is much the same as you describe.
int[] hello = new int[5]; // reference hello is on stack, the object is on the heap.
hello[0] = 2; // Java puts this value directly in same slot and doesn't
// create a wrapping object.
Java primitive arrays are stored in the heap as arrays of primitives, not of Integers etc. I do not believe that the actual implementation of how they are stored is specified, so a boolean[] may very well be implemented by an int[] in memory
In Java, the Array is considered as an Object whether it holds primitive variables or object type, in java Array has one and only one instance variable called length.
int[] arr = new int[5];
arr here is an object reference array variable, which is stored on the STACK if its used Inside the method(ie as local variable), But if its used as an instance variable, then its stored inside the object on the Heap.
The .NET 1.0 way of creating collection of integers (for example) was:
ArrayList list = new ArrayList();
list.Add(i); /* boxing */
int j = (int)list[0]; /* unboxing */
The penalty of using this is the lack of type safety and performance due to boxing and unboxing.
The .NET 2.0 way is to use generics:
List<int> list = new List<int>();
list.Add(i);
int j = list[0];
The price of boxing (to my understanding) is the need to create an object on the heap, copy the stack allocated integer to the new object and vice-versa for unboxing.
How does the use of generics overcome this? Does the stack-allocated integer stays on the stack and being pointed to from the heap (I guess this is not the case because of what will happen when it will get out of scope)? It seems like there is still a need of copying it somewhere else out of the stack.
What is really going on?
When it comes to collections, generics make it possible to avoid boxing/unboxing by utilizing actual T[] arrays internally. List<T> for example uses a T[] array to store its contents.
The array, of course, is a reference type and is therefore (in the current version of the CLR, yada yada) stored on the heap. But since it's a T[] and not an object[], the array's elements can be stored "directly": that is, they're still on the heap, but they're on the heap in the array instead of being boxed and having the array contain references to the boxes.
So for a List<int>, for example, what you'd have in the array would "look" like this:
[ 1 2 3 ]
Compare this to an ArrayList, which uses an object[] and would therefore "look" something like this:
[ *a *b *c ]
...where *a, etc. are references to objects (boxed integers):
*a -> 1
*b -> 2
*c -> 3
Excuse those crude illustrations; hopefully you know what I mean.
Your confusion is a result of misunderstanding what the relationship is between the stack, the heap, and variables. Here's the correct way to think about it.
A variable is a storage location that has a type.
The lifetime of a variable can either be short or long. By "short" we mean "until the current function returns or throws" and by "long" we mean "possibly longer than that".
If the type of a variable is a reference type then the contents of the variable is a reference to a long-lived storage location. If the type of a variable is a value type then the contents of the variable is a value.
As an implementation detail, a storage location which is guaranteed to be short-lived can be allocated on the stack. A storage location which might be long-lived is allocated on the heap. Notice that this says nothing about "value types are always allocated on the stack." Value types are not always allocated on the stack:
int[] x = new int[10];
x[1] = 123;
x[1] is a storage location. It is long-lived; it might live longer than this method. Therefore it must be on the heap. The fact that it contains an int is irrelevant.
You correctly say why a boxed int is expensive:
The price of boxing is the need to create an object on the heap, copy the stack allocated integer to the new object and vice-versa for unboxing.
Where you go wrong is to say "the stack allocated integer". It doesn't matter where the integer was allocated. What matters was that its storage contained the integer, instead of containing a reference to a heap location. The price is the need to create the object and do the copy; that's the only cost that is relevant.
So why isn't a generic variable costly? If you have a variable of type T, and T is constructed to be int, then you have a variable of type int, period. A variable of type int is a storage location, and it contains an int. Whether that storage location is on the stack or the heap is completely irrelevant. What is relevant is that the storage location contains an int, instead of containing a reference to something on the heap. Since the storage location contains an int, you do not have to take on the costs of boxing and unboxing: allocating new storage on the heap and copying the int to the new storage.
Is that now clear?
Generics allows the list's internal array to be typed int[] instead of effectively object[], which would require boxing.
Here's what happens without generics:
You call Add(1).
The integer 1 is boxed into an object, which requires a new object to be constructed on the heap.
This object is passed to ArrayList.Add().
The boxed object is stuffed into an object[].
There are three levels of indirection here: ArrayList -> object[] -> object -> int.
With generics:
You call Add(1).
The int 1 is passed to List<int>.Add().
The int is stuffed into an int[].
So there are only two levels of indirection: List<int> -> int[] -> int.
A few other differences:
The non-generic method will require a sum of 8 or 12 bytes (one pointer, one int) to store the value, 4/8 in one allocation and 4 in the other. And this will probably be more due to alignment and padding. The generic method will require only 4 bytes of space in the array.
The non-generic method requires allocating a boxed int; the generic method does not. This is faster and reduces GC churn.
The non-generic method requires casts to extract values. This is not typesafe and it's a bit slower.
An ArrayList only handles the type object so to use this class requires casting to and from object. In the case of value types, this casting involves boxing and unboxing.
When you use a generic list the compiler outputs specialized code for that value type so that the actual values are stored in the list rather than a reference to objects that contain the values. Therefore no boxing is required.
The price of boxing (to my understanding) is the need to create an object on the heap, copy the stack allocated integer to the new object and vice-versa for unboxing.
I think you are assuming that value types are always instantiated on the stack. This is not the case - they can be created either on the heap, on the stack or in registers. For more information about this please see Eric Lippert's article: The Truth About Value Types.
In .NET 1, when the Add method is called:
Space is allocated on the heap; a new reference is made
The contents of the i variable is copied into the reference
A copy of the reference is put at the end of the list
In .NET 2:
A copy of the variable i is passed to the Add method
A copy of that copy is put at the end of the list
Yes, the i variable is still copied (after all, it's a value type, and value types are always copied - even if they're just method parameters). But there's no redundant copy made on the heap.
Why are you thinking in terms of WHERE the values\objects are stored? In C# value types can be stored on stack as well as heap depending upon what the CLR chooses.
Where generics make a difference is WHAT is stored in the collection. In case of ArrayList the collection contains references to boxed objects where as the List<int> contains int values themselves.