How variables address RAM? - c#

I'm pretty new to this, so if the question doesn't make sense, I apologize ahead of time.
int in c# is 4 bytes if I am correct. If I have the statement:
int x;
I would assume this is taking up 4 bytes of memory. If each memory address space is 1 byte then this would take up four address slots? If so, how does x map to the four address locations?

If I have the statement int x; I would assume this is taking up 4 bytes of memory. How does x map to the address of the four bytes?
First off, Mike is correct. C# has been designed specifically so that you do not need to worry about this stuff. Let the memory manager take care of it for you; it does a good job.
Assuming you do want to see how the sausage is made for your own edification: your assumption is not warranted. This statement does not need to cause any memory to be consumed. If it does cause memory to be consumed, the int consumes four bytes of memory.
There are two ways in which the local variable (*) can consume no memory. The first is that it is never used:
void M()
{
int x;
}
The compiler can be smart enough to know that x is never written to or read from, and it can be legally elided entirely. Obviously it then takes up no memory.
The second way that it can take up no memory is if the jitter chooses to enregister the local. It may assign a machine register specifically to that local variable. The variable then has no address associated with it because obviously registers do not have an address. (**)
Assuming that the local does take up memory, the jitter is responsible for keeping track of the location of that memory.
If the local is a perfectly normal local then the jitter will bump the stack pointer by four bytes, thereby reserving four bytes on the stack. It will then associate those four bytes with the local.
If the local is a closed-over outer local of an anonymous function, a local of an iterator block, or a local of an async method then the C# compiler will generate the local as a field of a class; the jitter asks the garbage collector to allocate the class instance and the jitter associates the local with a particular offset from the beginning of the memory buffer associated with that instance by the garbage collector.
All of this is implementation detail subject to change at any time; do not rely upon it.
(*) We know it is a local variable because you said it was a statement. A field declaration is not a statement.
(**) If unsafe code takes the address of a local, obviously it cannot be enregistered.

There's a lot (and I mean a LOT) that can be said about this. Various topics you're hitting on are things like the stack, the symbol table, memory management, the memory hierarchy, ... I could go on.
BUT, since you're new, I'll try to give an easier answer:
When you create a variable in a program (such as an int), you are telling the compiler to reserve a space in memory for that data. An int is 4 bytes, so 4 consecutive bytes are reserved. The memory location you were referring to only points to the beginning. It is known afterwards that the length is 4 bytes.
Now that memory location (in the case you provided) is not really saved in the same way that a variable would be. Every time there is a command that needs x, the command is instead replaced with a command that explicitly grabs that memory location. In other words, the address is saved in the "code" section of your program, not the "data" section.
This is just a really, REALLY high overview. Hopefully it helps.

You really should not need to worry about these things, since there is no way in C# that you could write code that would make use of this information.
But if you must know, at the machine-code level when we instruct the CPU to access the contents of x, it will be referred to using the address of the first one of those four bytes. The machine instruction that will do this will also contain information about how many bytes to be accessed, in this case four.

If the int x; is declared within a function, then the variable will be allocated on the stack, rather than the heap or global memory. The address of x in the compiler's symbol table will refer to the first byte of the four-byte integer. However since it is on the stack, the remembered address will be that of the offset on the stack, rather than a physical address. The variable will then be referenced via a instruction using that offset from the current stack pointer.
Assuming a 32-bit run-time, the offset on the stack will be aligned so the address is a multiple of 4 bytes, i.e. the offset will end in either 0, 4, 8 or 0x0c.
Furthermore because the 80x86 family is little-endian, the first byte of the integer will be the least significant, and the fourth byte will be the most significant, e.g. the decimal value 1,000,000 would be stored as the four bytes 0x40 0x42 0x0f 0x00.

Related

Immutable structs are thread safe they say [duplicate]

I am a tinkerer—no doubt about that. For this reason (and very little beyond that), I recently did a little experiment to confirm my suspicion that writing to a struct is not an atomic operation, which means that a so-called "immutable" value type which attempts to enforce certain constraints could hypothetically fail at its goal.
I wrote a blog post about this using the following type as an illustration:
struct SolidStruct
{
public SolidStruct(int value)
{
X = Y = Z = value;
}
public readonly int X;
public readonly int Y;
public readonly int Z;
}
While the above looks like a type for which it could never be true that X != Y or Y != Z, in fact this can happen if a value is "mid-assignment" at the same time it is copied to another location by a separate thread.
OK, big deal. A curiosity and little more. But then I had this hunch: my 64-bit CPU should actually be able to copy 64 bits atomically, right? So what if I got rid of Z and just stuck with X and Y? That's only 64 bits; it should be possible to overwrite those in one step.
Sure enough, it worked. (I realize some of you are probably furrowing your brows right now, thinking, Yeah, duh. How is this even interesting? Humor me.) Granted, I have no idea whether this is guaranteed or not given my system. I know next to nothing about registers, cache misses, etc. (I am literally just regurgitating terms I've heard without understanding their meaning); so this is all a black box to me at the moment.
The next thing I tried—again, just on a hunch—was a struct consisting of 32 bits using 2 short fields. This seemed to exhibit "atomic assignability" as well. But then I tried a 24-bit struct, using 3 byte fields: no go.
Suddenly the struct appeared to be susceptible to "mid-assignment" copies once again.
Down to 16 bits with 2 byte fields: atomic again!
Could someone explain to me why this is? I've heard of "bit packing", "cache line straddling", "alignment", etc.—but again, I don't really know what all that means, nor whether it's even relevant here. But I feel like I see a pattern, without being able to say exactly what it is; clarity would be greatly appreciated.
The pattern you're looking for is the native word size of the CPU.
Historically, the x86 family worked natively with 16-bit values (and before that, 8-bit values). For that reason, your CPU can handle these atomically: it's a single instruction to set these values.
As time progressed, the native element size increased to 32 bits, and later to 64 bits. In every case, an instruction was added to handle this specific amount of bits. However, for backwards compatibility, the old instructions were still kept around, so your 64-bit processor can work with all of the previous native sizes.
Since your struct elements are stored in contiguous memory (without padding, i.e. empty space), the runtime can exploit this knowledge to only execute that single instruction for elements of these sizes. Put simply, that creates the effect you're seeing, because the CPU can only execute one instruction at a time (although I'm not sure if true atomicity can be guaranteed on multi-core systems).
However, the native element size was never 24 bits. Consequently, there is no single instruction to write 24 bits, so multiple instructions are required for that, and you lose the atomicity.
The C# standard (ISO 23270:2006, ECMA-334) has this to say regarding atomicity:
12.5 Atomicity of variable references
Reads and writes of the following data types shall be atomic: bool, char, byte, sbyte, short, ushort,
uint, int, float, and reference types. In addition, reads and writes of enum types with an underlying type
in the previous list shall also be atomic. Reads and writes of other types, including long, ulong, double,
and decimal, as well as user-defined types, need not be atomic. (emphasis mine) Aside from the library functions designed
for that purpose, there is no guarantee of atomic read-modify-write, such as in the case of increment or
decrement.Your example X = Y = Z = value is short hand for 3 separate assignment operations, each of which is defined to be atomic by 12.5. The sequence of 3 operations (assign value to Z, assign Z to Y, assign Y to X) is not guaranteed to be atomic.
Since the language specification doesn't mandate atomicity, while X = Y = Z = value; might be an atomic operation, whether it is or not is dependent on a whole bunch of factors:
the whims of the compiler writers
what code generation optimizations options, if any, were selected at build time
the details of the JIT compiler responsible for turning the assembly's IL into machine language. Identical IL run under Mono, say, might exhibit different behaviour than when run under .Net 4.0 (and that might even differ from earlier versions of .Net).
the particular CPU on which the assembly is running.
One might also note that even a single machine instruction is not necessarily warranted to be an atomic operation—many are interruptable.
Further, visiting the CLI standard (ISO 23217:2006), we find section 12.6.6:
12.6.6 Atomic reads and writes
A conforming CLI shall guarantee that read and write access to properly
aligned memory locations no larger than the native word size (the size of type
native int) is atomic (see §12.6.2) when all the write accesses to a location are
the same size. Atomic writes shall alter no bits other than those written. Unless
explicit layout control (see Partition II (Controlling Instance Layout)) is used to
alter the default behavior, data elements no larger than the natural word size (the
size of a native int) shall be properly aligned. Object references shall be treated
as though they are stored in the native word size.
[Note: There is no guarantee
about atomic update (read-modify-write) of memory, except for methods provided for
that purpose as part of the class library (see Partition IV). (emphasis mine)
An atomic write of a “small data item” (an item no larger than the native word size)
is required to do an atomic read/modify/write on hardware that does not support direct
writes to small data items. end note]
[Note: There is no guaranteed atomic access to 8-byte data when the size of
a native int is 32 bits even though some implementations might perform atomic
operations when the data is aligned on an 8-byte boundary. end note]
x86 CPU operations take place in 8, 16, 32, or 64 bits; manipulating other sizes requires multiple operations.
The compiler and x86 CPU are going to be careful to move only exactly as many bytes as the structure defines. There are no x86 instructions that can move 24 bits in one operation, but there are single instruction moves for 8, 16, 32, and 64 bit data.
If you add another byte field to your 24 bit struct (making it a 32 bit struct), you should see your atomicity return.
Some compilers allow you to define padding on structs to make them behave like native register sized data. If you pad your 24 bit struct, the compiler will add another byte to "round up" the size to 32 bits so that the whole structure can be moved in one atomic instruction. The downside is your structure will always occupy 30% more space in memory.
Note that alignment of the structure in memory is also critical to atomicity. If a multibyte structure does not begin at an aligned address, it may span multiple cache lines in the CPU cache. Reading or writing this data will require multiple clock cycles and multiple read/writes even though the opcode is a single move instruction. So, even single instruction moves may not be atomic if the data is misaligned. x86 does guarantee atomicity for native sized read/writes on aligned boundaries, even in multicore systems.
It is possible to achieve memory atomicity with multi-step moves using the x86 LOCK prefix. However this should be avoided as it can be very expensive in multicore systems (LOCK not only blocks other cores from accessing memory, it also locks the system bus for the duration of the operation which can impact disk I/O and video operations. LOCK may also force the other cores to purge their local caches)

C# byte vs int for short integer values [duplicate]

This question already has answers here:
Should I use byte or int?
(6 answers)
Closed 5 years ago.
This question is related to the physical memory of a C# Program. As we know that, byte variable consumes 1 byte of memory and on the other hand an int (32-bit) variable consumes 4-bytes of memory. So, when we need variables with possibly smaller values (such as a counter variable i to iterate a loop 100 times) which one should we use in the below for loop? byte or int ?
for(byte i=0; i<100; ++i)
Kindly please give your opinion with reason and share you precious knowledge. I shall be glad and thankful to you :-)
Note: I use byte instead of int in such cases. But I have seen that many experienced programmers use int even when the expected values are less than 255. Please let me know if I am wrong. :-)
In most cases, you won't get any benefit from using byte instead of int. The reason is:
If the loop variable is stored in a CPU register: Since modern CPUs have a register width of 32 bits, and since you can't use only one fourth of a register, the resulting code would be pretty much the same either way.
If the loop variable is not stored in a CPU register, then it will most likely be stored on the stack. Compilers try to align memory locations at addresses which are multiples of 4, this has to do with performance, thus the compiler would also assign 4 bytes to your byte variable on the stack.
Depending on details of your code, the compiler would even add extra code to make sure that the memory location (on the stack or in a register) never exceeds 255, which would add extra code and makes it slower.
It's a totally different story with 8 bit microcontrollers like those from Atmel and Microchip, there your approach would make sense.

Why did object header size doubled in 64 bit architecture?

I can't really understand why did object header got twice bigger in 64 bit applications.
The object header was 8 bytes and in 64 bit it is 16 what are these additional bytes used for ?
The object header is made up of two fields, the syncblk and the method table pointer (aka "type handle"). Second field is easy to understand, it is a pointer so it must grow from 4 to 8 bytes in 64-bit mode.
The syncblk is the much less obvious case, it is mix of flags and values (lock owner thread id, hash code, sync block index). No reason to make it bigger in 64-bit mode. What matters is what happens after the object is collected by the GC. If the free space was not eliminated by compacting the heap then the object space participates in the free block list. Works like a doubly-linked list. The 2nd field is the forward pointer to the next free block. The object data space is used to store the size of the free block, basic reason why an object is never less than 12 bytes. And the syncblk stores the back pointer to the previous free block. So now it must be big enough to store a pointer and therefore needs to grow to 8 bytes. So it is 8 + 8 = 16 bytes.
Fwiw, the minimum object size in 64-bit mode is 24 bytes, even though 8 + 8 + 4 = 20 bytes would do just fine, just to ensure that everything is aligned to 8. Alignment matters a great deal, you'd never want to have a pointer value straddle the L1 cache line. Makes accessing it about x3 times slower. The <gcAllowVeryLargeObjects> option is another reason, added later.

Address where an element just past the end of an array would be stored

According to ECMA-335:
II.14.4.2 Managed pointers
Managed pointers (&) can point to an instance of a value type, a field of an object, a field of a value
type, an element of an array, or the address where an element just past the end of an array would be
stored (for pointer indexes into managed arrays).
The last part interests me. Does it mean that a reference beyond the end of an array is valid? How is such a reference obtained (possibly with IL)? How does the CLR handle reading and writing there?
It means the pointer is valid, but it does not mean that dereferencing it is valid.
For instance, if you have an array containing 10 Int32 values, which means 10 * 4 bytes, a pointer to the 40th byte after the start of the array is valid.
Dereferencing it isn't.
Which means reading or writing that location is not valid.
Think about segmented 'protected' memory.
The statement you quoted is echoing the C++ standard, that goes into some detail about how the pointer-past-the-end must be valid and safe to compare to.
A large part of the C++ library (algorithms) uses this kind of pointers as 'sentinel', equivalent with for example EOF.
With protected memory, just loading a pointer value outside your process space into a register could lead to a protection fault. It won't wait for a dereference.
What is actually being said here is that the last byte of your data may not be the last byte of an allocated segment. The memory manager will have to pad with 1 or more bytes. Thus allowing the compiler/optimizer to always use an address register for pointers.

GC.GetTotalMemory use and its return value

I have a binary file save on disk of size 15KB but why its memory size is always 4 bytes only
long mem1=GC.GetTotalMemory(false);
Object[] array= new Object[1000000];
array[1]=obj; // obj is the object content of the file before it is saved on disk
long mem2=GC.GetTotalMemory(false);
long sizeOfOneElementInArray=(mem2-mem1)/1000000;
I am wrong about something somewhere. I think it is incorrect because 4 bytes is not enough to store even a hello world string, but why is it incorrect.
Thanks for any help.
Is the assumption that by assigning obj to the index [1] in the array o that it would take a substantial number of bytes? All you are doing is assigning a reference. Not only that, but all that new Object[1000000] did was create an array (space to associate 1,000,000 Object's and the memory required by Object[].), not allocate 1,000,000 Object's. I am sure someone can elaborate even more about the internal data structures being used and why 4 bytes shows up.
The key thing to realize is that assigned obj to o[1] is not allocating additional memory for obj. If you are trying to determine an approximation call GC.GetTotalMemory before obj is allocated, then after. In your test obj is already allocated before you call the first GC.GetTotalMemory
In general, when MSDN documentation says things like A number that is the best available approximation of the number of bytes currently allocated in managed memory it is a bad idea to rely on it for accurate values :-)
All kidding aside, if your object 'o' is not used in the function after line #3 in your example, it is possible that it is being collected between lines 3 and 4. Just a guess.
First, if I use the code as written, x becomes 0 for me, because o is not used after the assignment, so the GC can collect it. The following assumes o is not collected (e.g. by using GC.KeepAlive(o) at the end of the method).
Let's look carefully what each of the two important lines of your code do (assuming 32-bit architecture):
Object[] o = new Object[1000000];
This line allocates 1000000 * 4 + 16 bytes. Each element in the array takes 4 bytes, because it's a reference (pointer) to an object. 16 bytes is the overhead of an array.
o[1] = obj;
This line changes one of the references in o to reference to obj. This line allocates exactly 0 bytes.
I'm not sure why are you confused about the result. It has to be 4, unless there are some unreferenced objects from earlier part of the code. In that case, it could be less than 4 (even a negative number). Of course, if this were a multi-threaded application, the result could be pretty much anything, depending on what other threads do.
And this all assumes that GetTotalMemory() is precise, which it doesn't have to be.
In my experience, marshal.sizeof() is generally a better method of getting an object's true size than asking the garbage collector.
http://msdn.microsoft.com/en-us/library/y3ybkfb3.aspx

Categories

Resources