How are strings stored in an object array?

How are strings stored in an object array? - c#

object[] objs = new object[]{"one","two","three"};
Are the strings stored in the array as references to the string objects
[#] - one
[#] - two
[#] - three
or are the string objects stored in the array elements?
[one][two][three]
Thanks.
Edit: Sorry, my fancy diagram failed miserably.

String objects can never be stored directly in an array, or as any other variable. It's always references, even in a simple case such as:
string x = "foo";
Here the value of x is a reference, not an object. No expression value is ever an object - it's always either a reference, a value type value, or a pointer.

Jon Skeet describes the actual implementation very well, but let's consider why it would be nonsensical for the CLR to store strings directly in an array.
The first reason is that storing strings directly in the array would harm performance. If strings were stored directly in an array, then to get to the element 1000 of the array the CLR would have to walk through the bytes of all the strings in the array until it reached element 1000, checking all the while for string boundaries. Since strings and any other reference types are stored in arrays as references, finding the right element of the array requires one multiplication, one addition, and following one pointer (the notion of a pointer here is at the implementation level, not the programmer-visible level). This produces much better performance.
The second reason that strings cannot reasonably be stored directly in an array is that C# arrays of reference type are covariant. Let's say that strings were stored directly in the array generated with
string[] strings = new string[] {"one", "two", "three"};
Then, you cast this to an object array, which is legal
object[] objs = (object[])strings;
How is the compiler supposed to generate code that takes this possibility into account? A method that takes an object array as a parameter can have a string array passed to it, so the CLR needs to know whether to index into the array as an object array, or a string array, or some other type of array. Somehow, at runtime every array would have to be marked with the type declaration of the array, and every array access would have to check the type declaration and then traverse the array differently depending on the type of the array. It's far simpler to stick with references, which allow a single implementation of array accesses and improve performance to boot.

They're stored internally as references. A copy of the string is stored, and anywhere that string is used, there's a reference to the same stored string. (this is one of many reasons that strings are immutable; otherwise, modifying one instance of a string would modify everywhere it appeared)

all the primitive types are stored directly into a array but all other object or reference types are stored as memory references. This is true for all Objects not limited to Strings.

Related

C# memory handling for methods

I have a question about they way C# functions, or methods, handle memory when certain objects are used as input arguments. I have tried searching for an answer to this but haven't been able to find anything, I might not know what to look for though.
The question: Say I have a really big integer array of size 10.000 by 10.000, called 'MyArray'. Lets say I moreover have some method called 'MyMethod' which takes several entries from two specified rows (this is the input) from MyArray and performs some operations on it, such as adding or multiplying these numbers, and then returns another integer.
To keep my code as short as possible I would prefer to make a method
MyMethod(int i, int j, int[][] MyArray)
rather than having to enter all the numbers from the array as seperate arguments. However does this mean the method creates a copy of MyArray when it is called or does C# know that if this data is only read and not edited in any way, that making a copy isn't needed?

In C#, arrays are actually objects, and not just addressable regions of contiguous memory as in C and C++. Thus, in our case, only the reference of the array is passed as an argument for the method.

C# does not create a copy as the array will be passed as a reference (like a C++ pointer) to the method. In general only struct types will be passed as a copy and normal class instances will be passed as a reference.
You can read more on the topic on MSDN

As you can read here : MSDN - Passing arrays as argument
Arrays can be passed as arguments to method parameters. Because arrays are reference types, the method can change the value of the elements.

Arrays are classes, and that's why they're just references and when we pass array into a method all we need is to pass an address (4 or 8 bytes). Proof:
Boolean isClass = typeof(int[][]).IsClass; // <- return true
Structs are passed by value, e.g. int is a struct:
Boolean isClass = typeof(int).IsClass; // <- return false;

C# Clear Array but not fill with 0 alternative of List.Clear (get size back to 0)

Is it possible to fully remove Array in C# but not to fill it with 0's:
for(int i=0;i<a.Length;i++)
{
a[i]=0;
}
or Array.Clear(a,0,a.Length);
But to clear it in a way that List.Clear() does so that it's size will be 0 again like before filling.
I tried
a=new int[15]; but prevous values where still there. Thanks!

Arrays in C# are fixed-length; you cannot change the size of an array. You can allocate an array of a different size and copy the elements in order to simulate resizing (this is exactly what List<T> does internally), but you cannot "clear an array" in the sense that you reduce it to zero elements.
I tried a=new int[15]; but prevous values where still there.
The previous values cannot possibly still be there, because this allocates a new int array of 15 elements, where all elements are zero.
Note that this does not alter the array that a referenced; rather, it creates a new array and stores a reference to it in a. So if you initialized a from another array variable, they would have referred to the same array, but after assigning a new array to a the other variable would continue to point to the old array. Perhaps this is where the "previous values" are coming from.
var a = new int[] { 1, 2, 3 };
var b = a;
// a and b now reference the same array.
a = new int[] { 4, 5, 6 };
// a is now {4,5,6} but b remains {1,2,3}

As others have said, it depends on the type semantics that you're putting into the array.
Value types (such as int, bool, and float) are ... well, values. They represent a quantity, something tangible, a state. Thus, they are required to be known at compile time and have a default value.
By contrast, reference types (basically every class) don't actually hold any values themselves, but "group" data together by means of reference. Reference types will either point to other reference types, or eventually to a value type (which holds actual data).
This distinction is important to your question. List<T> is a dynamically sized collection that can grow or shrink without creating a new object because of how it is implemented. Each element in the list points to the next element, thus it's size cannot be known ahead of time.
Arrays are a fixed-size collection that are declared to be a specific size. The type of array determines how much memory is reserved by the system. For example a byte[] of 100 elements will consume less memory than an Int64[] array of 100 elements. Thus, the system needs to know ahead of time how many bytes to reserve in total, which means it needs a default value to "fall back" on to satisfy compile-time checking. Where T[] is a reference type/class, this is null. For value types, this is usually 0 (or default(T)).
If you wanted to remove all the values of an array, similar to how List.Clear() works, you can do int[] a = new int[0];, but note that you are creating an entirely new array and reallocating the memory for them (hence the keyword new). Other objects will need to reference this new array. By design, you can't simply resize an array. A list is a mutable collection and supports changing the number of elements. You could also try int[] a = null, but this sets it to no object at all, which is again, something different.

It depends whether the array's elements are Value type or Reference type.
In your case it is value type so you'll have to have some value in it. You can not assign null to it.
Because value type objects have some default values.

Why should each element in array be allocated again in c#

Following is the code I wrote
Calc[] calculators = new Calc[10];
calculators[0].AddToSum(10); (the corresponding classes and methods are written).
But I got "Object reference not set to an instance of an object" exception.Then with some research I got the exception removed by doing following.
for (int i = 0; i < 10; i++)
{
calculators[i] = new Calc();
}
Can somebody explain why we need to allocate memory again unlike in c/c++.
This is how I did it in c++:
Calculator *calc=new Calculator[10]//I know I need to check for std::bad_alloc exception
calculators[0].AddToSum(10);
delete[] calc;

In C#, there are reference types, and there are value types. Classes are reference types. When you create a variable of a reference type, you are creating a reference, not an object. The default state of a reference is null. If you want it to refer to an object, you have to explicitly initialize it with new, or assign if from another initialized reference.
C++ does not have this distinction. Every type is a value type (though you can also create references to any type). When you create a variable of a value type, you are creating an object.

in new Calc[10] you are allocating and sizing the array. in new Calc() you are creating the actual Calc objects

But you would get that same error with this statement
Calc calc;
calc.AddToSum(10);
Object is null until you you assign a value.
Calc[] calculators = new Calc[10]; does not allocate.
Based on the answer from Benjamin (+1) it works if Calc is a reference type.
Can you just make Calc a struct?

I don't think you allocate the memory again, but you still need to instantiate some value for calculators[0].
In your first code-segment, your are trying to call .AddToSum on a value that is Null.
Ps: You could do the following instead, to initialize each Calc from the start:
Calc[] calculators = new Calc[10]{
new Calc(),
new Calc(),
...,
// Repeat 10 times to match array length
};
Update: In response to the comments below; Ok, try this then:
calc[] calculators = Enumerable.Repeat(new Calc(), 127).ToArray<Calc>();

When you create an array of objects in c++ you allocate memory for all the fields of each object. So if your objects have two integer fields and you make an array of size two, enough memory is allocated to hold four integers.
On the other hand in c# when you make an array of objects you are creating and array of references (pointers to objects). So you cannot store an instance unless you allocate memory for each reference (by using new).
The same thing in c++ would be making an array of pointers, and then you'll have to instantiate each element of your array.

Your C++ code is also wrong.
In C++ you've allocated an array with space for 10 Calculator objects.
When you do the operation, it's reading from that (uninitialized) memory, grabbing a value, and adding to it, then writing that back out.
But you've got an uninitialized object to start from.
It likely works in C++ because you have an object (Calculator) that doesn't require the constructor to be called. If it had any initialization that required the constructor to be called, it wouldn't work. If you were to use a debugger and put a breakpoint in Calculator constructor, you'll see it's never called.
Anyway, to directly answer the question, this is the way C# works. Allocating an array creates space for the array, but all objects within the array (assuming object types) are null until themselves allocated.
Think of it this way: I create an array to hold 10 objects of Class X. But X has a constructor that takes a string, and I want to call it with a different string for each of those objects. How would one do so without explicitly creating each of those 10 objects and passing the right string to each constructor?

What exactly does an ArrayList store?

I know I can add object of any type to an ArrayList instance. If I get it right, then reference types are casted to objects (value types are boxed). Also, does an ArrayList actually store lists of objects of reference type object?

Internally the ArrayList class uses a fixed size array object[] (object array) for storage. When you add elements those elements are automatically copied to their respective indexes in the array. When the max size is reached a new array is created with a larger size and the elements are recopied. So it's just a convenience wrapper around a static object array.

An ArrayList does not store objects, but merely the references to those objects.

An ArrayList is essentially a wrapper around an object[], with functionality to track space in the array and grow it (double it) as necessary. Note that usually List<T> is preferred, but to answer the question: yes, it just stores the references to the objects, which may well be boxed value-types.
A List<T> is pretty much the same, but around a T[], which means value types can be stored without boxing. Reference-types are still stored as references. You also get more type safety; i.e. you can't add the wrong thing nor cast a retrieved item improperly.

ArrayList is similar to List<object> and was created before .NET had generics.
It can store anything that derives from object, which is all reference and value types. So you could use it to store a lists of objects.

The internal storage for an ArrayList is an object array (object[]).
When storing reference types in the ArrayList, the reference is just cast to object and stored in the array. The reference type instance itself contains information about it's type, so it's possible to cast it back to the actual type when you get it from the ArrayList.
Value types are boxed inside an object, and the reference to that object is stored in the array. The object contains information about what type the value is, so that it can be unboxed correctly when you get it from the ArrayList.

They store any object.
Meaning it's possible to fill them with strings, integers, classes as long as they have been instantiated as an object.

Boxing and unboxing with generics

The .NET 1.0 way of creating collection of integers (for example) was:
ArrayList list = new ArrayList();
list.Add(i); /* boxing */
int j = (int)list[0]; /* unboxing */
The penalty of using this is the lack of type safety and performance due to boxing and unboxing.
The .NET 2.0 way is to use generics:
List<int> list = new List<int>();
list.Add(i);
int j = list[0];
The price of boxing (to my understanding) is the need to create an object on the heap, copy the stack allocated integer to the new object and vice-versa for unboxing.
How does the use of generics overcome this? Does the stack-allocated integer stays on the stack and being pointed to from the heap (I guess this is not the case because of what will happen when it will get out of scope)? It seems like there is still a need of copying it somewhere else out of the stack.
What is really going on?

When it comes to collections, generics make it possible to avoid boxing/unboxing by utilizing actual T[] arrays internally. List<T> for example uses a T[] array to store its contents.
The array, of course, is a reference type and is therefore (in the current version of the CLR, yada yada) stored on the heap. But since it's a T[] and not an object[], the array's elements can be stored "directly": that is, they're still on the heap, but they're on the heap in the array instead of being boxed and having the array contain references to the boxes.
So for a List<int>, for example, what you'd have in the array would "look" like this:
[ 1 2 3 ]
Compare this to an ArrayList, which uses an object[] and would therefore "look" something like this:
[ *a *b *c ]
...where *a, etc. are references to objects (boxed integers):
*a -> 1
*b -> 2
*c -> 3
Excuse those crude illustrations; hopefully you know what I mean.

Your confusion is a result of misunderstanding what the relationship is between the stack, the heap, and variables. Here's the correct way to think about it.
A variable is a storage location that has a type.
The lifetime of a variable can either be short or long. By "short" we mean "until the current function returns or throws" and by "long" we mean "possibly longer than that".
If the type of a variable is a reference type then the contents of the variable is a reference to a long-lived storage location. If the type of a variable is a value type then the contents of the variable is a value.
As an implementation detail, a storage location which is guaranteed to be short-lived can be allocated on the stack. A storage location which might be long-lived is allocated on the heap. Notice that this says nothing about "value types are always allocated on the stack." Value types are not always allocated on the stack:
int[] x = new int[10];
x[1] = 123;
x[1] is a storage location. It is long-lived; it might live longer than this method. Therefore it must be on the heap. The fact that it contains an int is irrelevant.
You correctly say why a boxed int is expensive:
The price of boxing is the need to create an object on the heap, copy the stack allocated integer to the new object and vice-versa for unboxing.
Where you go wrong is to say "the stack allocated integer". It doesn't matter where the integer was allocated. What matters was that its storage contained the integer, instead of containing a reference to a heap location. The price is the need to create the object and do the copy; that's the only cost that is relevant.
So why isn't a generic variable costly? If you have a variable of type T, and T is constructed to be int, then you have a variable of type int, period. A variable of type int is a storage location, and it contains an int. Whether that storage location is on the stack or the heap is completely irrelevant. What is relevant is that the storage location contains an int, instead of containing a reference to something on the heap. Since the storage location contains an int, you do not have to take on the costs of boxing and unboxing: allocating new storage on the heap and copying the int to the new storage.
Is that now clear?

Generics allows the list's internal array to be typed int[] instead of effectively object[], which would require boxing.
Here's what happens without generics:
You call Add(1).
The integer 1 is boxed into an object, which requires a new object to be constructed on the heap.
This object is passed to ArrayList.Add().
The boxed object is stuffed into an object[].
There are three levels of indirection here: ArrayList -> object[] -> object -> int.
With generics:
You call Add(1).
The int 1 is passed to List<int>.Add().
The int is stuffed into an int[].
So there are only two levels of indirection: List<int> -> int[] -> int.
A few other differences:
The non-generic method will require a sum of 8 or 12 bytes (one pointer, one int) to store the value, 4/8 in one allocation and 4 in the other. And this will probably be more due to alignment and padding. The generic method will require only 4 bytes of space in the array.
The non-generic method requires allocating a boxed int; the generic method does not. This is faster and reduces GC churn.
The non-generic method requires casts to extract values. This is not typesafe and it's a bit slower.

An ArrayList only handles the type object so to use this class requires casting to and from object. In the case of value types, this casting involves boxing and unboxing.
When you use a generic list the compiler outputs specialized code for that value type so that the actual values are stored in the list rather than a reference to objects that contain the values. Therefore no boxing is required.
The price of boxing (to my understanding) is the need to create an object on the heap, copy the stack allocated integer to the new object and vice-versa for unboxing.
I think you are assuming that value types are always instantiated on the stack. This is not the case - they can be created either on the heap, on the stack or in registers. For more information about this please see Eric Lippert's article: The Truth About Value Types.

In .NET 1, when the Add method is called:
Space is allocated on the heap; a new reference is made
The contents of the i variable is copied into the reference
A copy of the reference is put at the end of the list
In .NET 2:
A copy of the variable i is passed to the Add method
A copy of that copy is put at the end of the list
Yes, the i variable is still copied (after all, it's a value type, and value types are always copied - even if they're just method parameters). But there's no redundant copy made on the heap.

Why are you thinking in terms of WHERE the values\objects are stored? In C# value types can be stored on stack as well as heap depending upon what the CLR chooses.
Where generics make a difference is WHAT is stored in the collection. In case of ArrayList the collection contains references to boxed objects where as the List<int> contains int values themselves.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.