How much memory array of objects in c# consumes? - c#

Suppose that we have previously instantiated three objects A, B, C from class D
now an array defines as below:
D[] arr = new D[3];
arr[0]=A;
arr[1]=B;
arr[2]=C;
does array contains references to objects or has separate copy?

C# distinguishes reference types and value types.
A reference type is declared using the word class. Variables of these types contain references, so an array will be an array of references to the objects. Each reference is 4 bytes (on a 32-bit system) or 8 bytes (on a 64-bit system) large.
A value type is declared using the word struct. Values of this type are copied every time you assign them. An array of a value type contains copies of the values, so the size of the array is the size of the struct times the number of elements.
Normally when we say “object”, we refer to instances of a reference type, so the answer to your question is “yes”, but remember the difference and make sure that you don’t accidentally create a large array of a large struct.

An array of reference types only contains references.
In a 32 bit application references are 32 bits (4 bytes), and in a 64 bit application references are 64 bits (8 bytes). So, you can calculate the approximate size by multiplying the array length with the reference size. (There are also a few extra bytes for internal variables for the array class, and some extra bytes are used for memory management.)

You can look at the memory occupied by an array using WinDBG + SOS (or PSSCOR2). IIRC, an array of reference types is represented in memory by its length, followed by references to its elements, i.e. it's exact size is PLATFORM_POINTER_SIZE * (array.Length + 1)

The array is made out of pointers (32bit or 64bit) that points to the objects. An object is a reference type, only value types are copied to the array itself.

As #Yves said it has references to the objects. The array is a block of memory as it as in C.
So it size is sizeof(element) * count + the amount of memory needed by oop.

Related

Real size in memory of an array [duplicate]

I was trying to determine the overhead of the header on a .NET array (in a 32-bit process) using this code:
long bytes1 = GC.GetTotalMemory(false);
object[] array = new object[10000];
for (int i = 0; i < 10000; i++)
array[i] = new int[1];
long bytes2 = GC.GetTotalMemory(false);
array[0] = null; // ensure no garbage collection before this point
Console.WriteLine(bytes2 - bytes1);
// Calculate array overhead in bytes by subtracting the size of
// the array elements (40000 for object[10000] and 4 for each
// array), and dividing by the number of arrays (10001)
Console.WriteLine("Array overhead: {0:0.000}",
((double)(bytes2 - bytes1) - 40000) / 10001 - 4);
Console.Write("Press any key to continue...");
Console.ReadKey();
The result was
204800
Array overhead: 12.478
In a 32-bit process, object[1] should be the same size as int[1], but in fact the overhead jumps by 3.28 bytes to
237568
Array overhead: 15.755
Anyone know why?
(By the way, if anyone's curious, the overhead for non-array objects, e.g. (object)i in the loop above, is about 8 bytes (8.384). I heard it's 16 bytes in 64-bit processes.)
Here's a slightly neater (IMO) short but complete program to demonstrate the same thing:
using System;
class Test
{
const int Size = 100000;
static void Main()
{
object[] array = new object[Size];
long initialMemory = GC.GetTotalMemory(true);
for (int i = 0; i < Size; i++)
{
array[i] = new string[0];
}
long finalMemory = GC.GetTotalMemory(true);
GC.KeepAlive(array);
long total = finalMemory - initialMemory;
Console.WriteLine("Size of each element: {0:0.000} bytes",
((double)total) / Size);
}
}
But I get the same results - the overhead for any reference type array is 16 bytes, whereas the overhead for any value type array is 12 bytes. I'm still trying to work out why that is, with the help of the CLI spec. Don't forget that reference type arrays are covariant, which may be relevant...
EDIT: With the help of cordbg, I can confirm Brian's answer - the type pointer of a reference-type array is the same regardless of the actual element type. Presumably there's some funkiness in object.GetType() (which is non-virtual, remember) to account for this.
So, with code of:
object[] x = new object[1];
string[] y = new string[1];
int[] z = new int[1];
z[0] = 0x12345678;
lock(z) {}
We end up with something like the following:
Variables:
x=(0x1f228c8) <System.Object[]>
y=(0x1f228dc) <System.String[]>
z=(0x1f228f0) <System.Int32[]>
Memory:
0x1f228c4: 00000000 003284dc 00000001 00326d54 00000000 // Data for x
0x1f228d8: 00000000 003284dc 00000001 00329134 00000000 // Data for y
0x1f228ec: 00000000 00d443fc 00000001 12345678 // Data for z
Note that I've dumped the memory 1 word before the value of the variable itself.
For x and y, the values are:
The sync block, used for locking the hash code (or a thin lock - see Brian's comment)
Type pointer
Size of array
Element type pointer
Null reference (first element)
For z, the values are:
Sync block
Type pointer
Size of array
0x12345678 (first element)
Different value type arrays (byte[], int[] etc) end up with different type pointers, whereas all reference type arrays use the same type pointer, but have a different element type pointer. The element type pointer is the same value as you'd find as the type pointer for an object of that type. So if we looked at a string object's memory in the above run, it would have a type pointer of 0x00329134.
The word before the type pointer certainly has something to do with either the monitor or the hash code: calling GetHashCode() populates that bit of memory, and I believe the default object.GetHashCode() obtains a sync block to ensure hash code uniqueness for the lifetime of the object. However, just doing lock(x){} didn't do anything, which surprised me...
All of this is only valid for "vector" types, by the way - in the CLR, a "vector" type is a single-dimensional array with a lower-bound of 0. Other arrays will have a different layout - for one thing, they'd need the lower bound stored...
So far this has been experimentation, but here's the guesswork - the reason for the system being implemented the way it has. From here on, I really am just guessing.
All object[] arrays can share the same JIT code. They're going to behave the same way in terms of memory allocation, array access, Length property and (importantly) the layout of references for the GC. Compare that with value type arrays, where different value types may have different GC "footprints" (e.g. one might have a byte and then a reference, others will have no references at all, etc).
Every time you assign a value within an object[] the runtime needs to check that it's valid. It needs to check that the type of the object whose reference you're using for the new element value is compatible with the element type of the array. For instance:
object[] x = new object[1];
object[] y = new string[1];
x[0] = new object(); // Valid
y[0] = new object(); // Invalid - will throw an exception
This is the covariance I mentioned earlier. Now given that this is going to happen for every single assignment, it makes sense to reduce the number of indirections. In particular, I suspect you don't really want to blow the cache by having to go to the type object for each assigment to get the element type. I suspect (and my x86 assembly isn't good enough to verify this) that the test is something like:
Is the value to be copied a null reference? If so, that's fine. (Done.)
Fetch the type pointer of the object the reference points at.
Is that type pointer the same as the element type pointer (simple binary equality check)? If so, that's fine. (Done.)
Is that type pointer assignment-compatible with the element type pointer? (Much more complicated check, with inheritance and interfaces involved.) If so, that's fine - otherwise, throw an exception.
If we can terminate the search in the first three steps, there's not a lot of indirection - which is good for something that's going to happen as often as array assignments. None of this needs to happen for value type assignments, because that's statically verifiable.
So, that's why I believe reference type arrays are slightly bigger than value type arrays.
Great question - really interesting to delve into it :)
Array is a reference type. All reference types carry two additional word fields. The type reference and a SyncBlock index field, which among other things is used to implement locks in the CLR. So the type overhead on reference types is 8 bytes on 32 bit. On top of that the array itself also stores the length which is another 4 bytes. This brings the total overhead to 12 bytes.
And I just learned from Jon Skeet's answer, arrays of reference types has an additional 4 bytes overhead. This can be confirmed using WinDbg. It turns out that the additional word is another type reference for the type stored in the array. All arrays of reference types are stored internally as object[], with the additional reference to the type object of the actual type. So a string[] is really just an object[] with an additional type reference to the type string. For details please see below.
Values stored in arrays: Arrays of reference types hold references to objects, so each entry in the array is the size of a reference (i.e. 4 bytes on 32 bit). Arrays of value types store the values inline and thus each element will take up the size of the type in question.
This question may also be of interest: C# List<double> size vs double[] size
Gory Details
Consider the following code
var strings = new string[1];
var ints = new int[1];
strings[0] = "hello world";
ints[0] = 42;
Attaching WinDbg shows the following:
First let's take a look at the value type array.
0:000> !dumparray -details 017e2acc
Name: System.Int32[]
MethodTable: 63b9aa40
EEClass: 6395b4d4
Size: 16(0x10) bytes
Array: Rank 1, Number of elements 1, Type Int32
Element Methodtable: 63b9aaf0
[0] 017e2ad4
Name: System.Int32
MethodTable 63b9aaf0
EEClass: 6395b548
Size: 12(0xc) bytes
(C:\Windows\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
Fields:
MT Field Offset Type VT Attr Value Name
63b9aaf0 40003f0 0 System.Int32 1 instance 42 m_value <=== Our value
0:000> !objsize 017e2acc
sizeof(017e2acc) = 16 ( 0x10) bytes (System.Int32[])
0:000> dd 017e2acc -0x4
017e2ac8 00000000 63b9aa40 00000001 0000002a <=== That's the value
First we dump the array and the one element with value of 42. As can be seen the size is 16 bytes. That is 4 bytes for the int32 value itself, 8 bytes for regular reference type overhead and another 4 bytes for the length of the array.
The raw dump shows the SyncBlock, the method table for int[], the length, and the value of 42 (2a in hex). Notice that the SyncBlock is located just in front of the object reference.
Next, let's look at the string[] to find out what the additional word is used for.
0:000> !dumparray -details 017e2ab8
Name: System.String[]
MethodTable: 63b74ed0
EEClass: 6395a8a0
Size: 20(0x14) bytes
Array: Rank 1, Number of elements 1, Type CLASS
Element Methodtable: 63b988a4
[0] 017e2a90
Name: System.String
MethodTable: 63b988a4
EEClass: 6395a498
Size: 40(0x28) bytes <=== Size of the string
(C:\Windows\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: hello world
Fields:
MT Field Offset Type VT Attr Value Name
63b9aaf0 4000096 4 System.Int32 1 instance 12 m_arrayLength
63b9aaf0 4000097 8 System.Int32 1 instance 11 m_stringLength
63b99584 4000098 c System.Char 1 instance 68 m_firstChar
63b988a4 4000099 10 System.String 0 shared static Empty
>> Domain:Value 00226438:017e1198 <<
63b994d4 400009a 14 System.Char[] 0 shared static WhitespaceChars
>> Domain:Value 00226438:017e1760 <<
0:000> !objsize 017e2ab8
sizeof(017e2ab8) = 60 ( 0x3c) bytes (System.Object[]) <=== Notice the underlying type of the string[]
0:000> dd 017e2ab8 -0x4
017e2ab4 00000000 63b74ed0 00000001 63b988a4 <=== Method table for string
017e2ac4 017e2a90 <=== Address of the string in memory
0:000> !dumpmt 63b988a4
EEClass: 6395a498
Module: 63931000
Name: System.String
mdToken: 02000024 (C:\Windows\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
BaseSize: 0x10
ComponentSize: 0x2
Number of IFaces in IFaceMap: 7
Slots in VTable: 196
First we dump the array and the string. Next we dump the size of the string[]. Notice that WinDbg lists the type as System.Object[] here. The object size in this case includes the string itself, so the total size is the 20 from the array plus the 40 for the string.
By dumping the raw bytes of the instance we can see the following: First we have the SyncBlock, then follows the method table for object[], then the length of the array. After that we find the additional 4 bytes with the reference to the method table for string. This can be verified by the dumpmt command as shown above. Finally we find the single reference to the actual string instance.
In conclusion
The overhead for arrays can be broken down as follows (on 32 bit that is)
4 bytes SyncBlock
4 bytes for Method table (type reference) for the array itself
4 bytes for Length of array
Arrays of reference types adds another 4 bytes to hold the method table of the actual element type (reference type arrays are object[] under the hood)
I.e. the overhead is 12 bytes for value type arrays and 16 bytes for reference type arrays.
I think you are making some faulty assumptions while measuring, as the memory allocation (via GetTotalMemory) during your loop may be different than the actual required memory for just the arrays - the memory may be allocated in larger blocks, there may be other objects in memory that are reclaimed during the loop, etc.
Here's some info for you on array overhead:
Arrays Undocumented
Article by Jeffrey Richter
.Net Type Internals
Because heap management (since you deal with GetTotalMemory) can only allocate rather large blocks, which latter are allocated by smaller chunks for programmer purposes by CLR.
I'm sorry for the offtopic but I found interesting info on memory overheading just today morning.
We have a project which operates huge amount of data (up to 2GB). As the major storage we use Dictionary<T,T>. Thousands of dictionaries are created actually. After change it to List<T> for keys and List<T> for values (we implemented IDictionary<T,T> ourselves) the memory usage decreased on about 30-40%.
Why?

C# Large object in medium size collection

I'm pretty new to the memory problem. Hope you don't think this is a stupid question to ask.
I know that memory larger than 85,000 Bytes would be put into LOH in C#
i.e.
Byte[] hugeByteCollection = new Byte[85000];
I'm wondering if a collection with size 10000 - 20000 with an object that contains 10 member variables (byte type) will be put into LOH or SOH ?
The size of an array of objects is the number of objects times the pointer size. This is because only value types is stored in the array itself, reference types (objects) will be stored somewhere else and will not count towards the size of the array. So 85000/4=21250 objects, and 85000/8=10625 objects can be stored in an array on the SOH in 32bit and 64bit mode, respectively.
Edit:
Thanks to Hans Passant for pointing out that this assumes that the collection type used is an array and not a list. Lists resize themselves to be bigger than the content to avoid too many allocations. See this link for details

Are C# arrays guaranteed to be stored sequentially in memory?

According to many sources on the internet, in C# arrays are stored sequentially. That is if I have a pointer to the first element in the array, say int *start = &array[0], then I can access array[i] by doing *(start + i).
However, I was looking through the C# Language Specification which is stored in C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC#\Specifications\1033 and I cannot find anyplace that guarantees that this will be the case.
In practice this might not be an issue, if say Microsoft and Mono keep their implementations in sync, but I was wondering if there is an official source that guarantees that arrays are stored sequentially in memory.
Thanks!
From the ECMA specification for the CLR:
I.8.9.1 Array types
....
Array elements shall be laid out within the array object in row-major
order (i.e., the elements associated with the rightmost array
dimension shall be laid out contiguously from lowest to highest
index). The actual storage allocated for each array element can
include platform-specific padding. (The size of this storage, in
bytes, is returned by the sizeof instruction when it is applied to the
type of that array’s elements.)
So yes in a compliment implementation of ECMA-335 Common Language Infrastructure elements in an a array will be laid out sequentially.
But there may be platform specific padding applied, so implementations on 64 bit platforms may chose to allocate 64bits for each Int32.
Yes, one-dimensional, zero-based arrays (vectors) in .NET are stored sequentially. In fact, you can use unsafe code with pointers to access array elements one-by-one by incrementing the pointer.
ECMA-335 specification of CLI, section 1.8.9.1 says the following about arrays:
Array elements shall be laid out within the array object in row-major order (i.e., the elements associated with the rightmost array dimension shall be laid out contiguously from lowest to highest index). The actual storage allocated for each array element can include platform-specific
padding.

What are exaclty Large Objects in C#?

Can I call an array of 1 million integers as large object? Or one instance of that object should be > 85 KB to be considered as large object?
If I make an array like int[1000000]. This whole object with each member is treated as one object with size > 85 KB. right?
If I have a class X {int i; string j} .. then List having count > 100000. Will this be saved to LOH?
Basically what I mean is if the size of an object like Class X is say 8.6KB and I make a datastructure like List myList. Then if the list count is 9 then it is not LO but if it has count 10 then it is?
I want a last answer(almost go all the answers):
Now I know that array is a collection of pointers of 8 bytes. So an array to be Large object it should have 85000/8 number of elements or more. Is that correct?
Any object larger than 85,000 bytes is considered to a large object and is treated differently during garbage collection. An array can itself be over 85,000 bytes large if all its references (aka pointers) make up that amount.
In case of arrays the actual count is not made up of the size of the objects but their references. Let's say you have an array of Customer, and let's assume that a Customer has 3 integers, each 8 bytes in size, and each reference is also 8 bytes in size. Then an array of 10 Customers actually takes up 80 bytes, not 240. The 24 bytes of each Customer are separate objects.
See this article for further information, and refer also to the Large Object Heap Improvements in .NET 4.5.
If you are asking in context of garbage collection then yes 85 KB would be a big object since big objects are immediately marked as a generation 2 objects.

How to calculate size of Nullable<T> datatypes [duplicate]

This question already has answers here:
What is the memory footprint of a Nullable<T>
(8 answers)
Closed 9 years ago.
Actually, i am willing to know that how much memory is being consumed by following datatypes
int? = memory size?
double? = memory size?
bool? = memory size?
Can anybody give me information about their storage or a method to calculate their size
answer, I believe, is here
Basically, add to the size of the non-nullable the size of a bool.
You can use the following code to get the actual size at runtime. The value returned will be the same as the element alignment of an array int?[], which is consistent with the value return by the CLI's sizeof opcode (ECMA-335 Partition I, §8.9.1). Since nullable types are treated as reference types, the C# sizeof operator cannot be used for this, even in an unsafe context. Instead, we use TypedReference and a 2-element array to calculate the same information.
public static int SizeOf<T>()
{
T[] array = new T[2];
TypedReference elem1 = __makeref(array[0]);
TypedReference elem2 = __makeref(array[1]);
unsafe
{
byte* address1 = (byte*)*(IntPtr*)(&elem1);
byte* address2 = (byte*)*(IntPtr*)(&elem2);
return (int)(address2 - address1);
}
}
You can then use the following.
// This returns 8 on my test, but the runtime is free to change this to
// any value greater than sizeof(int)+sizeof(bool)
int nullableSize = sizeof(int?);
Do you want to know the memmory consumption of e.g. a int? x? MSDN says:
... The common language runtime assigns storage based on the
characteristics of the platform on which your application is
executing. In some circumstances it packs your declared elements as
closely together as possible; in other cases it aligns their memory
addresses to natural hardware boundaries. Also, storage assignment is
different on a 64-bit platform than it is on a 32-bit platform.
The same considerations apply to each member of a composite data type
such as a structure or an array. Furthermore, some composite types
have additional memory requirements. For example, an array uses extra
memory for the array itself and also for each dimension. On a 32-bit
platform, this overhead is currently 12 bytes plus 8 bytes for each
dimension. On a 64-bit platform the requirement is doubled. You cannot
rely on simply adding together the nominal storage allocations of the
components.
An Object referring to any elementary or composite data type uses 4
bytes in addition to the data contained in the data type.

Categories

Resources