Getting the size of a field in bytes with C# - c#

I have a class, and I want to inspect its fields and report eventually how many bytes each field takes. I assume all fields are of type Int32, byte, etc.
How can I find out easily how many bytes does the field take?
I need something like:
Int32 a;
// int a_size = a.GetSizeInBytes;
// a_size should be 4

You can't, basically. It will depend on padding, which may well be based on the CLR version you're using and the processor etc. It's easier to work out the total size of an object, assuming it has no references to other objects: create a big array, use GC.GetTotalMemory for a base point, fill the array with references to new instances of your type, and then call GetTotalMemory again. Take one value away from the other, and divide by the number of instances. You should probably create a single instance beforehand to make sure that no new JITted code contributes to the number. Yes, it's as hacky as it sounds - but I've used it to good effect before now.
Just yesterday I was thinking it would be a good idea to write a little helper class for this. Let me know if you'd be interested.
EDIT: There are two other suggestions, and I'd like to address them both.
Firstly, the sizeof operator: this only shows how much space the type takes up in the abstract, with no padding applied round it. (It includes padding within a structure, but not padding applied to a variable of that type within another type.)
Next, Marshal.SizeOf: this only shows the unmanaged size after marshalling, not the actual size in memory. As the documentation explicitly states:
The size returned is the actually the
size of the unmanaged type. The
unmanaged and managed sizes of an
object can differ. For character
types, the size is affected by the
CharSet value applied to that class.
And again, padding can make a difference.
Just to clarify what I mean about padding being relevant, consider these two classes:
class FourBytes { byte a, b, c, d; }
class FiveBytes { byte a, b, c, d, e; }
On my x86 box, an instance of FourBytes takes 12 bytes (including overhead). An instance of FiveBytes takes 16 bytes. The only difference is the "e" variable - so does that take 4 bytes? Well, sort of... and sort of not. Fairly obviously, you could remove any single variable from FiveBytes to get the size back down to 12 bytes, but that doesn't mean that each of the variables takes up 4 bytes (think about removing all of them!). The cost of a single variable just isn't a concept which makes a lot of sense here.

Depending on the needs of the questionee, Marshal.SizeOf might or might not give you what you want. (Edited after Jon Skeet posted his answer).
using System;
using System.Runtime.InteropServices;
public class MyClass
{
public static void Main()
{
Int32 a = 10;
Console.WriteLine(Marshal.SizeOf(a));
Console.ReadLine();
}
}
Note that, as jkersch says, sizeof can be used, but unfortunately only with value types. If you need the size of a class, Marshal.SizeOf is the way to go.
Jon Skeet has laid out why neither sizeof nor Marshal.SizeOf is perfect. I guess the questionee needs to decide wether either is acceptable to his problem.

From Jon Skeets recipe in his answer I tried to make the helper class he was refering to. Suggestions for improvements are welcome.
public class MeasureSize<T>
{
private readonly Func<T> _generator;
private const int NumberOfInstances = 10000;
private readonly T[] _memArray;
public MeasureSize(Func<T> generator)
{
_generator = generator;
_memArray = new T[NumberOfInstances];
}
public long GetByteSize()
{
//Make one to make sure it is jitted
_generator();
long oldSize = GC.GetTotalMemory(false);
for(int i=0; i < NumberOfInstances; i++)
{
_memArray[i] = _generator();
}
long newSize = GC.GetTotalMemory(false);
return (newSize - oldSize) / NumberOfInstances;
}
}
Usage:
Should be created with a Func that generates new Instances of T. Make sure the same instance is not returned everytime. E.g. This would be fine:
public long SizeOfSomeObject()
{
var measure = new MeasureSize<SomeObject>(() => new SomeObject());
return measure.GetByteSize();
}

It can be done indirectly, without considering the alignment.
The number of bytes that reference type instance is equal service fields size + type fields size.
Service fields(in 32x takes 4 bytes each, 64x 8 bytes):
Sysblockindex
Pointer to methods table
+Optional(only for arrays) array size
So, for class without any fileds, his instance takes 8 bytes on 32x machine. If it is class with one field, reference on the same class instance, so, this class takes(64x):
Sysblockindex + pMthdTable + reference on class = 8 + 8 + 8 = 24 bytes
If it is value type, it does not have any instance fields, therefore in takes only his fileds size. For example if we have struct with one int field, then on 32x machine it takes only 4 bytes memory.

I had to boil this down all the way to IL level, but I finally got this functionality into C# with a very tiny library.
You can get it (BSD licensed) at bitbucket
Example code:
using Earlz.BareMetal;
...
Console.WriteLine(BareMetal.SizeOf<int>()); //returns 4 everywhere I've tested
Console.WriteLine(BareMetal.SizeOf<string>()); //returns 8 on 64-bit platforms and 4 on 32-bit
Console.WriteLine(BareMetal.SizeOf<Foo>()); //returns 16 in some places, 24 in others. Varies by platform and framework version
...
struct Foo
{
int a, b;
byte c;
object foo;
}
Basically, what I did was write a quick class-method wrapper around the sizeof IL instruction. This instruction will get the raw amount of memory a reference to an object will use. For instance, if you had an array of T, then the sizeof instruction would tell you how many bytes apart each array element is.
This is extremely different from C#'s sizeof operator. For one, C# only allows pure value types because it's not really possible to get the size of anything else in a static manner. In contrast, the sizeof instruction works at a runtime level. So, however much memory a reference to a type would use during this particular instance would be returned.
You can see some more info and a bit more in-depth sample code at my blog

if you have the type, use the sizeof operator. it will return the type`s size in byte.
e.g.
Console.WriteLine(sizeof(int));
will output:
4

You can use method overloading as a trick to determine the field size:
public static int FieldSize(int Field) { return sizeof(int); }
public static int FieldSize(bool Field) { return sizeof(bool); }
public static int FieldSize(SomeStructType Field) { return sizeof(SomeStructType); }

Simplest way is: int size = *((int*)type.TypeHandle.Value + 1)
I know this is implementation detail but GC relies on it and it needs to be as close to start of the methodtable for efficiency plus taking into consideration how GC code complex is nobody will dare to change it in future. In fact it works for every minor/major versions of .net framework+.net core. (Currently unable to test for 1.0)
If you want more reliable way, emit a struct in a dynamic assembly with [StructLayout(LayoutKind.Auto)] with exact same fields in same order, take its size with sizeof IL instruction. You may want to emit a static method within struct which simply returns this value. Then add 2*IntPtr.Size for object header. This should give you exact value.
But if your class derives from another class, you need to find each size of base class seperatly and add them + 2*Inptr.Size again for header. You can do this by getting fields with BindingFlags.DeclaredOnly flag.

System.Runtime.CompilerServices.Unsafe
Use System.Runtime.CompilerServices.Unsafe.SizeOf<T>() where T: unmanaged
(when not running in .NET Core you need to install that NuGet package)
Documentation states:
Returns the size of an object of the given type parameter.
It seems to use the sizeof IL-instruction just as Earlz solution does as well. (source)
The unmanaged constraint is new in C# 7.3

Related

How to get the elements of a System.Numerics.Vector in C#?

I want to access the elements of a System.Numerics.Vector<T> in C#.
I'm following the official documentation: https://learn.microsoft.com/en-us/dotnet/api/system.numerics.vector-1?view=netcore-2.2
I'm able to create different vectors with different datatypes.
For example: var test = new Vector<double>(new double[] { 1.0, 2.0, 1.0 });
But now I have the problem, that I'm unable to call test.Count; it is not possible to call Count on an instance of type System.Numerics.Vector<T>.
I can access single elements with the access operator [], but I dont know how many elements are in the vector.
According to the documentation, there should be the public property:
public static int Count { get; }
But I can not call in on my instance of System.Numerics.Vector<T>. Instead I can call it only in a static manner like following:
Vector<double>.Count
This is equal to 2.
I can also call:
Vector<Int32>.Count
returning: 4 and
Vector<Int16>.Count
returning 8.
And now I'm really a bit confused, about how to use this static property. At first, I thought, that this property would return the number of elements stored in the vector (as stated in the documentation). Second, I thought, that this property returns the size of the vector in memory, but this number increases from double to Int32 to Int16.
Interestingly I can not call this static property from my instance created by:
var test = new Vector<double>(new double[] { 1.0, 2.0, 1.0 });
I can not call test.Count!
Do you know how to access the elements of System.Numerics.Vector<T>?
There is no way to do that. Vector<T> is of a fixed size, since it's trying to optimize for hardware acceleration. The docs state:
The count of a Vector<T> instance is fixed, but its upper limit is CPU-register dependent. It is intended to be used as a building block for vectorizing large algorithms.
Reading the source at https://source.dot.net/#System.Private.CoreLib/shared/System/Numerics/Vector.cs
Shows that it will throw if less data is passed in that's required and will only take up to Count of items passed in.
Vector.Count property only shows who much elements of concrete type it can fit. And that why it's value only increase from double to short int. You can only fit 16 byte in there, therefore 2 double var, 4 int var, 8 short int var and so on.

Difference between Marshal.SizeOf and sizeof, I just don't get it

Until now I have just taken for granted that Marshal.SizeOf is the right way to compute the memory size of a blittable struct on the unmanaged heap (which seems to be the consensus here on SO and almost everywhere else on the web).
But after having read some cautions against Marshal.SizeOf (this article after "But there's a problem...") I tried it out and now I am completely confused:
public struct TestStruct
{
public char x;
public char y;
}
class Program
{
public static unsafe void Main(string[] args)
{
TestStruct s;
s.x = (char)0xABCD;
s.y = (char)0x1234;
// this results in size 4 (two Unicode characters)
Console.WriteLine(sizeof(TestStruct));
TestStruct* ps = &s;
// shows how the struct is seen from the managed side... okay!
Console.WriteLine((int)s.x);
Console.WriteLine((int)s.y);
// shows the same as before (meaning that -> is based on
// the same memory layout as in the managed case?)... okay!
Console.WriteLine((int)ps->x);
Console.WriteLine((int)ps->y);
// let's try the same on the unmanaged heap
int marshalSize = Marshal.SizeOf(typeof(TestStruct));
// this results in size 2 (two single byte characters)
Console.WriteLine(marshalSize);
TestStruct* ps2 = (TestStruct*)Marshal.AllocHGlobal(marshalSize);
// hmmm, put to 16 bit numbers into only 2 allocated
// bytes, this must surely fail...
ps2->x = (char)0xABCD;
ps2->y = (char)0x1234;
// huh??? same result as before, storing two 16bit values in
// only two bytes??? next will be a perpetuum mobile...
// at least I'd expect an access violation
Console.WriteLine((int)ps2->x);
Console.WriteLine((int)ps2->y);
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
}
}
What's going wrong here? What memory layout does the field dereferencing operator '->' assume? Is '->' even the right operator for addressing unmanaged structs? Or is Marshal.SizeOf the wrong size operator for unmanaged structs?
I have found nothing that explains this in a language I understand. Except for "...struct layout is undiscoverable..." and "...in most cases..." wishy-washy kind of stuff.
The difference is: the sizeof operator takes a type name and tells you how many bytes of managed memory need to be allocated for an instance of that struct.This is not necessarily stack memory; structs are allocated off the heap when they are array elements, fields of a class, and so on. By contrast, Marshal.SizeOf takes either a type object or an instance of the type, and tells you how many bytes of unmanaged memory need to be allocated. These can be different for a variety of reasons. The name of the type gives you a clue: Marshal.SizeOf is intended to be used when marshaling a structure to unmanaged memory.
Another difference between the two is that the sizeof operator can only take the name of an unmanaged type; that is, a struct type whose fields are only integral types, Booleans, pointers and so on. (See the specification for an exact definition.) Marshal.SizeOf by contrast can take any class or struct type.
I think the one question you still don't have answered is what's going on in your particular situation:
&ps2->x
0x02ca4370 <------
*&ps2->x: 0xabcd 'ꯍ'
&ps2->y
0x02ca4372 <-------
*&ps2->y: 0x1234 'ሴ'
You are writing to and reading from (possibly) unallocated memory. Because of the memory area you're in, it's not detected.
This will reproduce the expected behavior (at least on my system, YMMV):
TestStruct* ps2 = (TestStruct*)Marshal.AllocHGlobal(marshalSize*10000);
// hmmm, put to 16 bit numbers into only 2 allocated
// bytes, this must surely fail...
for (int i = 0; i < 10000; i++)
{
ps2->x = (char)0xABCD;
ps2->y = (char)0x1234;
ps2++;
}
What memory layout does the field dereferencing operator '->' assume?
Whatever the CLI decides
Is '->' even the right operator for addressing unmanaged structs?
That is an ambiguous concept. There are structs in unmanaged memory accessed via the CLI: these follow CLI rules. And there are structs that are merely notional monikers for unmanaged code (perhaps C/C++) accessing the same memory. This follows the rules of that framework. Marshalling usually refers to P/Invoke, but that isn't necessarily applicable here.
Or is Marshal.SizeOf the wrong size operator for unmanaged structs?
I'd default to Unsafe.SizeOf<T>, which is essentially sizeof(T) - which is perfectly well-defined for the CLI/IL (including padding rules etc), but isn't possible in C#.
A char marshals, by default, to an ANSI byte. This allows interoperability with most C libraries and is fundamental to the operation of the .NET runtime.
I believe the correct solution is to change TestStruct to:
public struct TestStruct
{
[System.Runtime.InteropServices.MarshalAs(UnmanagedType.U2)]
public char x;
[System.Runtime.InteropServices.MarshalAs(UnmanagedType.U2)]
public char y;
}
UnmanagedType.U2 means unsigned 'integer' 2 bytes long, which makes it equivalent to the wchar_t type in a C header.
Seamless porting of C structures to .NET is possible with attention to detail and opens many doors for interop with native libraries.

Declaring a field as a bit (as a single bit, as opposed to a byte multiple) in C#

C# 6.0 in a Nutshell by Joseph Albahari and Ben Albahari (O’Reilly).
Copyright 2016 Joseph Albahari and Ben Albahari, 978-1-491-92706-9.
introduces, at page 312, BitArrays as one of the Collection types .NET provides:
BitArray
A BitArray is a dynamically sized collection of compacted bool values.
It is more memory-efficient than both a simple array of bool and a
generic List of bool, because it uses only one bit for each value,
whereas the bool type otherwise occupies one byte for each value.
It's nice to have the possibility of declaring a collection of bits instead of working with bytes when you are interested in binary values only, but what about declaring a single bit field ?
like:
public class X
{
public [bit-type] MyBit {get; set;}
}
.NET does not support it ?
The existent posts on that touch the topic talk about setting individual bits within, ultimately, a byte variable. I am asking if, once .NET thought of supporting working with bit variables, in a collection, if it also supports declaring a non-collection such variable.
So your question is whether .NET supports this or not. The answer is no.
Why? It's fundamentally possible to have such a feature. But the demand is really low. It's better to invest the developer time elsewhere.
If you want to make use of memory below the byte granularity you will need to build this yourself. BitArray is not intrinsic to the runtime. It manipulates the bits of some bigger type (I think it's int-based). You can do the same thing.
BitVector32 is a built-in struct that you can use to individually address 32 bits.
As you can see in the .Net reference, BitArray internally stores the values within an Array of int
public BitArray(int length, bool defaultValue) {
...
m_array = new int[GetArrayLength(length, BitsPerInt32)];
m_length = length;
int fillValue = defaultValue ? unchecked(((int)0xffffffff)) : 0;
for (int i = 0; i < m_array.Length; i++) {
m_array[i] = fillValue;
}
_version = 0;
}
So the least thing that gets allocated with a BitArray is already an entire int for the reference and even more if you store data in it. This also makes sense since the memory the used for addressing anything is already in data words. Those are - depending on the architecture - at least 4 bytes long already.
You can of course define an own type for a single Bit to store, but this will also take at least a byte - if not even a complete word and a byte due to being a reference type - to do so. Memory is allocated to a program by the OS in terms of memory addresses, which usually address bytes, so anything less is not entirely useful.
It takes a lot of binary values to store, to even make up for the space already lost by using the type in the first place, so the only really useful application of this technique of storing bits is when you've got lots of them, so that you can profit of the 8:1 memory ratio.

Fast array copy in C#

I have a C# class that contains an int[] array (and a couple of other fields, but the array is the main thing). The code often creates copies of this class and profiling shows that the Array.Copy() call to copy this array takes a lot of time. What can I do to make it faster?
The array size is very small and constant: 12 elements. So ideally I'd like something like a C-style array: a single block of memory that's inside the class itself (not a pointer). Is this possible in C#? (I can use unsafe code if needed.)
I've already tried:
1) Using a UIn64 and bit-shifting instead of the array. (The values of each element are also very small.) This does make the copy fast, but slows down the program overall.
2) Using separate fields for each array element: int element0, int element1, int element2, etc. Again, this is slower overall when I have to access the element at a given index.
I would checkout the System.Buffer.BlockCopy if you are really concerned about speed.
http://msdn.microsoft.com/en-us/library/system.buffer.blockcopy.aspx
Simple Example:
int[] a = new int[] {1,2,3,4,5,6,7,8};
int[] b = new int[a.Length];
int size = sizeof(int);
int length = a.Length * size;
System.Buffer.BlockCopy(a, 0, b, 0, length);
Great discussion on it over here: Array.Copy vs Buffer.BlockCopy
This post is old, but anyone in a similar situation as the OP should have a look at fixed size buffers in structs. They are exactly what OP was asking for: an array of primitive types with a constant size stored directly in the class.
You can create a struct to represent your collection, which will contain the fixed size buffer. The data will be stored directly within the struct, which will be stored directly within your class. You can copy through simple assignment.
They come with a few caveats:
They can only be used with primitive types.
They require the "unsafe" keyword on your struct.
Size must be known at compile time.
It used to be that you had to use the fixed keyword and pointers to access them, but recent changes to C# catering to performance programming have made that unnecessary. You can now work with them just like arrays.
public unsafe struct MyIntContainer
{
private fixed int myIntegers[12];
public int this[int index]
{
get => this.myIntegers[index];
set => this.myIntegers[index] = value;
}
}
There is no built-in bound checking, so it would be best for you to include that yourself on such a property, encapsulating any functionality which skips bound checks inside of a method. I am on mobile, or I would have worked that into my example.
You asked about managed arrays. If you are content to use fixed / unsafe, this can be very fast.
struct is assignable, like any primitive. Almost certainly faster than Buffer.BlockCopy() or any other method, due to the lack of method call overhead:
public unsafe struct MyStruct //the actual struct used, contains all
{
public int a;
public unsafe fixed byte buffer[16];
public ulong b;
//etc.
}
public unsafe struct FixedSizeBufferWrapper //contains _only_ the buffer
{
public unsafe fixed byte buffer[16];
}
unsafe
{
fixed (byte* bufferA = myStructA.buffer, bufferB = myStructB.buffer)
{
*((FixedSizeBufferWrapper*)bufferA) =
*((FixedSizeBufferWrapper*)bufferB);
}
}
We cast fixed-size byte buffers from each of your original structs to the wrapper pointer type and dereference each pointer SO THAT we can assign one to the other by value; assigning fixed buffers directly is not possible, hence the wrapper, which is basically zero overhead (it just affects values used in pointer arithmetic that is done anyway). That wrapper is only ever used for casting.
We have to cast because (at least in my version of C#) we cannot assign anything other than a primitive type (usually byte[]) as the buffer, and we aren't allowed to cast inside fixed(...).
EDIT: This appears get translated into a call to Buffer.Memcpy() (specifically Buffer.memcpy4() in my case, in Unity / Mono) under the hood to do the copy.

Size of a simple class containing a single integer property in .NET

I already watched a session from Mario Hewardt that mentions a class containing an integer property takes 16 bytes of space. I'd like to know how the size of a simple following class can result in 16?
[StructLayout(LayoutKind.Sequential)]
public class MyClass
{
public int Age;
}
The problem is that the integer takes 4 bytes of space (right?) so where the heck those 12 other bytes come from? I've also used Marshal.SizeOf to get the class size which resulted in 4:
int n = Marshal.SizeOf(typeof(MyClass));
//n == 4
I've read this and it seems that the above class holds 8 bytes of internal data (what are these data anyway?), 4 byte for the int value, and 4 bytes of unused space. So if it takes 16 byte, why Marshal.SizeOf returns 4? and if it takes 4, where those 8 bytes are gone? I'm truly confused.
Marshal.SizeOf(Type) returns the size of the equivalent unmanaged type, i.e. how many bytes would the equivalent unmanaged type (e.g. a C++ class) take if it had the same field layout and packing. Note that this function is designed to work only with classes with the [StructLayout] attribute with a LayoutKind of either Explicit or Sequential.
The memory used up when the CLR allocates an object on the managed heap depends on the internals of the CLR in question. Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects is an article about the CLRv2 implementation of object allocation. Essentially, every object has two hidden fields - a sync block index and a type handle. The sync block is an internal structure used when an object is used with a lock(obj) {} statement. The type handle provides runtime type information about the given instance - it contains the object's method table, etc.
Indeed, a class in C# is not just the fields id contains.
There has to be a virtual functions tables so you can use virtual & override keywords, and certainly few other things, like maybe a pointer to the type metadata...
The fact is you should not care... what's important is "how many bytes do I need to be able to rebuild the instance?". The answer is what Marshal.SizeOf gives you : 4.
Marshalling only the 4 bytes of the integer (yes you're right, it's 4 bytes), and knowing you want to deserialize a MyClass, then you have enough information to do it.

Categories

Resources