New Reference When Concatenating A String - c#

A couple of weeks ago, I was asked a C# question in a job interview. The question was exactly this:
string a = "Hello, ";
for(int i = 0; i < 99999999; i++)
{
a += "world!";
}
I was asked exactly, "why this is a bad method for concatenated string?". My response was some sort of "readability, append should be chosen" etc.
But apparently, this is not the case according to the guy that was interviewing me. So, according to him, every time we concatenate a string, because of the structure of CLR, a new reference is created in memory. So, in the end of the following code, we would have 99999999 of string variable "a" in memory.
I thought, the objects are created just once in the stack as soon as a value is assigned to them (I'm not talking about heap). The way I knew was the memory allocation is done once in the stack for each primitive data types, their values are modified as needed and disposed when the execution of a scope is finished. Is that wrong? Or, are new references of variable "a" actually created in the stack every single time it is concatenated?
Can someone please explain how it works for stack? Many thanks.

First remember these two facts:
string is an immutable type (existing instances are never modified)
string is a reference type (the "value" of a string expression is a reference to the location where the instance is)
Therefore, a statement like:
a += "world!";
will work similar to a = a + "world!";. It will first follow the reference to the "old" a and concat that old string with the string "world!". This involves copying the contents of both old strings into a new memory location. That is the "+" part. It will then move the reference of a from pointing to the old location into pointing to the new location (the newly concatenated string). That is the "=" assignment part of the statement.
Now it follows that the old string instance is left with no references to it. So at some point, the garbage collector will remove it (and possibly move memory around to avoid "holes").
So I guess your job interviewer was absolutely right there. The loop of your question will create a bunch of (mostly very long!) strings in memory (in the heap since you want to be technical).
A simpler approach could be:
string a = "Hello, "
+ string.Concat(Enumerable.Repeat("world!", 999...));
Here we use string.Concat. That method will know it will need to concatenate a bunch of strings into one long string, and it can use some sort of expandable buffer (such as a StringBuilder or even a pointer type char*) internally to make sure it does not create a myriad of "dead" object instances in mememory.
(Do not use ToArray() or similar, as in string.Concat(Enumerable.Repeat("world!", 999...).ToArray()), of course!)

.NET distinguishes between ref and value types. string is a ref type. It is allocated on the heap without exception. It's lifetime is controlled by the GC.
So, in the end of the following code, we would have 99999999 of string variable "a" in memory.
99999999 have been allocated. Of course, some of them might be GC'ed already.
their values are modified as needed and disposed when the execution of a scope is finished
String is not a primitive or a value type. Those are allocated "inline" inside of something else such as the stack, an array or inside heap objects. They also can be boxed and become true heap objects. None of that applies here.
The problem with this code is not the allocation but the quadratic runtime complexity. I don't think this loop would ever finish in practice.

Reference types (i.e. classes & strings) are always created in the heap. Value types (such as structs) are created in the stack and are lost when a function ends execution.
However stating that after the loop you will have N objects in memory is not entirely true. In each evaluation of the of the
a += "world!";
statement you do create a new string. What happens to the previously created string is more complicated. The garbage collector now owns, since there is no other reference to it in your code and will release it at some point, which you don't exactly know when will happen.
Finally, the ultimate problem with this code is that you believe you are modifying an object, but strings are immutable, meaning you cannot really change their value once created. You can only create new ones and this is what the += operator is doing. This would be far more efficient with a StringBuilder which was made to be mutable.
EDIT
As requested, here's stack / heap related clarification. Value types are not always in the stack. They are in the stack when you declare them inside a function body:
void method()
{
int a = 1; // goes in the stack
}
But go into the heap when they are part of other objects, like when an integer is a property of a class (since the whole class instance is in the heap).

Related

How are reference types cleared from memory?

As objects are reference types they are stored in the heap and primitive data types are store on the stack.
But an object is a collection of primitive datatypes as well reference type i.e. a object may have a integer data member and/or may have another object within it.
When the scope ends the primitive data memory is released from the stack but the heap memory is handled by the garbage collector.
Now my question is: if an object also has a primitive data member then when are they removed?
As objects are reference types they are stored in the heap and primitive data types are store on the stack.
Not quite. Value types, which includes the primitives, but also struct types are stored on the stack when they are locals. They can also be stored on the heap if boxed or in an array or, as you note, a field of a reference type.
Reference types have one or more references which might also be stored on the stack—the local(s) you address it through—and the representation of the object itself on the heap.
When the scope ends the primitive data memory is released from the stack but the heap memory is handled by the garbage collector.
Not quite.
First, there isn't really a "releasing" operation. Say we were using 4 slots on the stack to store the values 1-4*:
[1][2][3][4][ ][ ][ ][ ]
^
Using up to here.
(I'm going to completely ignore the matter of what happens between function calls for the sake of simplicity).
Now say we stop using the last 2 slots. There's no need to "release" anything:
[1][2][3][4][ ][ ][ ][ ]
^
Using up to here.
Only if we go to, e.g. use 1 new slot to store the value 5, need we overwrite anything:
[1][2][5][4][ ][ ][ ][ ]
^
Using up to here.
The "releasing" just changed which memory was considered in use and which considered available.
Now consider the following C# code:
public void WriteOneMore(int num)
{
int result = num + 1;
Console.WriteLine(result);
}
Say you call that with the value 42. The relevant portion of the stack is:
[42]
^
Using up to here.
Now, after int result = num + 1; there are two values in scope; result and num. As such the stack might be:
[42][43]
^
Using up to here.
However, num is never used again. The compiler and jitter know this, so they might have reused the same slot:
[43]
^
Using up to here.
Because "in scope" refers to the source code, and what variables can be used in particular places, but the stack is used according to what variables actually are used in particular places, so it can often use less stack space than the source may suggest. Conversely, sometimes you find the same variable becoming more than one slot, if it makes things easier for the compiler in some way. This is no big deal here, but becomes important when we come to reference types.
the heap memory is handled by the garbage collector.
Let's consider what that actually means.
If an application needs heap memory for new objects, it takes that memory from a free part of a heap. If there isn't enough heap memory available it could ask the OS for more, but before that it may try garbage collecting.
When this happens, first the garbage collector makes a note of what heap-stored (reference types including boxed value types) objects it can't get rid of.
One set of such objects are those that are in a static variable.
Another is those that are in reachable parts of the stack. So if the stack is like:
["a"]["b"]["c"]["d"]["e"]
^
Using up to here.
Then the values "a", "b" and "c" cannot be collected.
The next set is any object that can be reached via a field of one of the objects that it already knows can't be collected, or through a field in one of those, and so on.
(A final step is any object that isn't ineligible due to the above, which needs to be finalised, they get put on the finalisation queue here, so they'll be eligible after the finaliser thread has dealt with them).
Now. On the heap, the object looks a bit like;
[Sync][RTTI][Field0][Field1] … [FieldN]
Here "Sync" marks the sync block used if you lock on the object. "RTTI" marks a pointer to type information, used to obtain the type and to enable virtual methods to work. The rest is fields, whether value-types contained directly or references to other reference types.
Okay. Let's say this object is one that the collector decides it can collect.
It simply changes that block of memory from being considered not available to use, to being available to use. That's it.
In a subsequent step all in-use objects get moved together to compact the used memory into one block and the free into another. Our old object might be overwritten at this point, or it might not be overwritten for some time to come. We don't really care, because the corpse of that dead object is just a bunch of 1s and 0s sitting there doing nothing now, waiting for the palimpsest of volatile memory to be written to once more.
So the primitive fields are released at the point where the object's memory is considered available to use, but again, they may still be present in RAM for some time, or not, they're just ignored.
It's worth remembering, that just as the values on the stack may not correspond to what is "in scope" in the source code, so therefore an object can be collected while it's in scope; garbage collection depends on the real use of the stack, not the source. This mostly doesn't affect anything, because most attempts to use something in the code means that it is now part of the real use of the stack and therefore won't be collected. Of the very few cases where it can affect something probably the most common is an attempt to use a Timer that is only referenced through a local; the main thread doesn't use it any more so that stack space can be used up and then the timing thread finds no such timer. This is where GC.KeepAlive() comes in.
*When it comes to the running code, locals might be stored in registers and never actually in the stack of memory. At the level of considering how the .NET code works, it's generally easiest just to consider them also "on the stack". At the level of considering how the machine code works, that's not true. When the garbage collector looks at what is "on the stack" to see what it can't delete, it also looks at what references are in registers.
It is very hard to explain such a fundamental but not always easy to understand things. However in the last 15 years many good explanation was written.
In case you do not want to read them (obviously...) here is a very short and (consequently not complete) wrap up: (note: still I strongly recommend to investigate toward in the literature)
Note: The following part is edited slightly based on comment conversation about "primitive type" terminology:
(edit)
In this question's context it is more appropriate to talk about "value type" instead of "primitive type". Regardless of the type is primitive or not, it only matters is it value type or reference type in this context.
(end edit)
Now the point:
Reference type have a reference (anywhere, like in heap or stack) which points the instance allocated always on the heap. Value type are stored (anywhere, like in the heap or stack) immediately embedded that place, so there is no indirection.
Samples:
Local variable of a value type: stack
Local variable of a reference type: instance itself on the heap, and the reference is on the stack
Member variable (value type): Embedded into the allocated space of the instance which's member variable it is.
Member variable (reference type): Its reference embedded into the allocated space of the instance which's member variable it is, and its instance on the heap.
Now my question is: if an object also has a primitive data member then when are they removed?
Answer: When the containing object is removed. (Hopefully it is clear based on the 4 samples: The containing object can be on the heap or on the stack, so the "containing object removal" could be a GC collection or a simple stack pointer set when returning from a method.)

What is the most efficient way to reassign a struct?

I have in my program a struct type called Square which is used to represent the location (int Rank, int File) of a square on a chess board.
If I assign Square by new Square sq(); say and then I want to reassign it, is it better to do so by
sq = new Square(rank, file);
or by writing an internal Set method and calling Set thus
sq.Set(rank, file);
What I am asking is when you use new on a struct, does the runtime reallocate new memory and call the constructor or does it reuse the existing memory? If it does the former then it would be better to write a Set method to avoid overheads would it not? Cheers.
The traditional thinking these days is the value types should be immutable, so you would not want to have a Set method unless that is returning a new Square object and not mutating the original. As such,
sq = new Square(rank, file);
And
sq = sq.GenerateSquare(rank, file); // renamed Set method from original question to appease comments
Should ultimately perform the same operation.
But given this approach, GenerateSquare would also possibly be better as a static method of Square rather than something depending upon any given instance. (An instance method would be more useful if something about the existing instance was used in the creation of a new instance.)
Structures are value types, so a simple assignment will do the job:
Square sq = new Square(rank, file);
Square anotherSq = sq;
Worrying about the weight of garbage collection or memory use is something you should not be concerned with until you have profiled your application and know it will be an issue. A simple structure like this is not going be taking up much space and likely not the cause of problems if your program does hit a bottleneck.
For structs... space for new structs is created on the stack, (see NOTE), not the heap, and is not subject to garbage collection. If the assignment variable is an already existing copy of the struct, then it is overwritten. No additional memory is used.
NOTE: If you create a new struct and assign it to a variable that is a property of a reference type, then yes, the reference type is on the heap, but the memory slot the struct is copied to is the already existing memory slot for that already existing reference type, no new heap memory is allocated. And the struct is not independantly subject to garbage collection....
But others' comments about your design are correct, structs should generally only be used for immutable domain objects, things that are simple and easy to create (small footprint) and have no identity (i.e., one telephone number object set to (802) 123-4567 is equivilent to and can be used anywhere else you need a telephone number object set to (802) 123-4567
So in general, these objects should not have constrcutors or property setters, they should have static factory methods that create instances of them.

Why .NET String is immutable? [duplicate]

This question already has answers here:
Why can't strings be mutable in Java and .NET?
(17 answers)
Closed 9 years ago.
As we all know, String is immutable. What are the reasons for String being immutable and the introduction of StringBuilder class as mutable?
Instances of immutable types are inherently thread-safe, since no thread can modify it, the risk of a thread modifying it in a way that interferes with another is removed (the reference itself is a different matter).
Similarly, the fact that aliasing can't produce changes (if x and y both refer to the same object a change to x entails a change to y) allows for considerable compiler optimisations.
Memory-saving optimisations are also possible. Interning and atomising being the most obvious examples, though we can do other versions of the same principle. I once produced a memory saving of about half a GB by comparing immutable objects and replacing references to duplicates so that they all pointed to the same instance (time-consuming, but a minute's extra start-up to save a massive amount of memory was a performance win in the case in question). With mutable objects that can't be done.
No side-effects can come from passing an immutable type as a method to a parameter unless it is out or ref (since that changes the reference, not the object). A programmer therefore knows that if string x = "abc" at the start of a method, and that doesn't change in the body of the method, then x == "abc" at the end of the method.
Conceptually, the semantics are more like value types; in particular equality is based on state rather than identity. This means that "abc" == "ab" + "c". While this doesn't require immutability, the fact that a reference to such a string will always equal "abc" throughout its lifetime (which does require immutability) makes uses as keys where maintaining equality to previous values is vital, much easier to ensure correctness of (strings are indeed commonly used as keys).
Conceptually, it can make more sense to be immutable. If we add a month onto Christmas, we haven't changed Christmas, we have produced a new date in late January. It makes sense therefore that Christmas.AddMonths(1) produces a new DateTime rather than changing a mutable one. (Another example, if I as a mutable object change my name, what has changed is which name I am using, "Jon" remains immutable and other Jons will be unaffected.
Copying is fast and simple, to create a clone just return this. Since the copy can't be changed anyway, pretending something is its own copy is safe.
[Edit, I'd forgotten this one]. Internal state can be safely shared between objects. For example, if you were implementing list which was backed by an array, a start index and a count, then the most expensive part of creating a sub-range would be copying the objects. However, if it was immutable then the sub-range object could reference the same array, with only the start index and count having to change, with a very considerable change to construction time.
In all, for objects which don't have undergoing change as part of their purpose, there can be many advantages in being immutable. The main disadvantage is in requiring extra constructions, though even here it's often overstated (remember, you have to do several appends before StringBuilder becomes more efficient than the equivalent series of concatenations, with their inherent construction).
It would be a disadvantage if mutability was part of the purpose of an object (who'd want to be modeled by an Employee object whose salary could never ever change) though sometimes even then it can be useful (in a many web and other stateless applications, code doing read operations is separate from that doing updates, and using different objects may be natural - I wouldn't make an object immutable and then force that pattern, but if I already had that pattern I might make my "read" objects immutable for the performance and correctness-guarantee gain).
Copy-on-write is a middle ground. Here the "real" class holds a reference to a "state" class. State classes are shared on copy operations, but if you change the state, a new copy of the state class is created. This is more often used with C++ than C#, which is why it's std:string enjoys some, but not all, of the advantages of immutable types, while remaining mutable.
Making strings immutable has many advantages. It provides automatic thread safety, and makes strings behave like an intrinsic type in a simple, effective manner. It also allows for extra efficiencies at runtime (such as allowing effective string interning to reduce resource usage), and has huge security advantages, since it's impossible for an third party API call to change your strings.
StringBuilder was added in order to address the one major disadvantage of immutable strings - runtime construction of immutable types causes a lot of GC pressure and is inherently slow. By making an explicit, mutable class to handle this, this issue is addressed without adding unneeded complication to the string class.
Strings are not really immutable. They are just publicly immutable.
It means you cannot modify them from their public interface. But in the inside the are actually mutable.
If you don't believe me look at the String.Concat definition using reflector.
The last lines are...
int length = str0.Length;
string dest = FastAllocateString(length + str1.Length);
FillStringChecked(dest, 0, str0);
FillStringChecked(dest, length, str1);
return dest;
As you can see the FastAllocateString returns an empty but allocated string and then it is modified by FillStringChecked
Actually the FastAllocateString is an extern method and the FillStringChecked is unsafe so it uses pointers to copy the bytes.
Maybe there are better examples but this is the one I have found so far.
string management is an expensive process. keeping strings immutable allows repeated strings to be reused, rather than re-created.
Why are string types immutable in C#
String is a reference type, so it is never copied, but passed by reference.
Compare this to the C++ std::string
object (which is not immutable), which
is passed by value. This means that if
you want to use a String as a key in a
Hashtable, you're fine in C++, because
C++ will copy the string to store the
key in the hashtable (actually
std::hash_map, but still) for later
comparison. So even if you later
modify the std::string instance,
you're fine. But in .Net, when you use
a String in a Hashtable, it will store
a reference to that instance. Now
assume for a moment that strings
aren't immutable, and see what
happens:
1. Somebody inserts a value x with key "hello" into a Hashtable.
2. The Hashtable computes the hash value for the String, and places a
reference to the string and the value
x in the appropriate bucket.
3. The user modifies the String instance to be "bye".
4. Now somebody wants the value in the hashtable associated with "hello". It
ends up looking in the correct bucket,
but when comparing the strings it says
"bye"!="hello", so no value is
returned.
5. Maybe somebody wants the value "bye"? "bye" probably has a different
hash, so the hashtable would look in a
different bucket. No "bye" keys in
that bucket, so our entry still isn't
found.
Making strings immutable means that
step 3 is impossible. If somebody
modifies the string he's creating a
new string object, leaving the old one
alone. Which means the key in the
hashtable is still "hello", and thus
still correct.
So, probably among other things,
immutable strings are a way to enable
strings that are passed by reference
to be used as keys in a hashtable or
similar dictionary object.
Just to throw this in, an often forgotten view is of security, picture this scenario if strings were mutable:
string dir = "C:\SomePlainFolder";
//Kick off another thread
GetDirectoryContents(dir);
void GetDirectoryContents(string directory)
{
if(HasAccess(directory) {
//Here the other thread changed the string to "C:\AllYourPasswords\"
return Contents(directory);
}
return null;
}
You see how it could be very, very bad if you were allowed to mutate strings once they were passed.
You never have to defensively copy immutable data. Despite the fact that you need to copy it to mutate it, often the ability to freely alias and never have to worry about unintended consequences of this aliasing can lead to better performance because of the lack of defensive copying.
Strings are passed as reference types in .NET.
Reference types place a pointer on the stack, to the actual instance that resides on the managed heap. This is different to Value types, who hold their entire instance on the stack.
When a value type is passed as a parameter, the runtime creates a copy of the value on the stack and passes that value into a method. This is why integers must be passed with a 'ref' keyword to return an updated value.
When a reference type is passed, the runtime creates a copy of the pointer on the stack. That copied pointer still points to the original instance of the reference type.
The string type has an overloaded = operator which creates a copy of itself, instead of a copy of the pointer - making it behave more like a value type. However, if only the pointer was copied, a second string operation could accidently overwrite the value of a private member of another class causing some pretty nasty results.
As other posts have mentioned, the StringBuilder class allows for the creation of strings without the GC overhead.
Strings and other concrete objects are typically expressed as immutable objects to improve readability and runtime efficiency. Security is another, a process can't change your string and inject code into the string
Imagine you pass a mutable string to a function but don't expect it to be changed. Then what if the function changes that string? In C++, for instance, you could simply do call-by-value (difference between std::string and std::string& parameter), but in C# it's all about references so if you passed mutable strings around every function could change it and trigger unexpected side effects.
This is just one of various reasons. Performance is another one (interned strings, for example).
There are five common ways by which a class data store data that cannot be modified outside the storing class' control:
As value-type primitives
By holding a freely-shareable reference to class object whose properties of interest are all immutable
By holding a reference to a mutable class object that will never be exposed to anything that might mutate any properties of interest
As a struct, whether "mutable" or "immutable", all of whose fields are of types #1-#4 (not #5).
By holding the only extant copy of a reference to an object whose properties can only be mutated via that reference.
Because strings are of variable length, they cannot be value-type primitives, nor can their character data be stored in a struct. Among the remaining choices, the only one which wouldn't require that strings' character data be stored in some kind of immutable object would be #5. While it would be possible to design a framework around option #5, that choice would require that any code which wanted a copy of a string that couldn't be changed outside its control would have to make a private copy for itself. While it hardly be impossible to do that, the amount of extra code required to do that, and the amount of extra run-time processing necessary to make defensive copies of everything, would far outweigh the slight benefits that could come from having string be mutable, especially given that there is a mutable string type (System.Text.StringBuilder) which accomplishes 99% of what could be accomplished with a mutable string.
Immutable Strings also prevent concurrency-related issues.
Imagine being an OS working with a string that some other thread was
modifying behind your back. How could you validate anything without
making a copy?

C# parameters by reference and .net garbage collection

I have been trying to figure out the intricacies of the .NET garbage collection system and I have a question related to C# reference parameters. If I understand correctly, variables defined in a method are stored on the stack and are not affected by garbage collection. So, in this example:
public class Test
{
public Test()
{
}
public int DoIt()
{
int t = 7;
Increment(ref t);
return t;
}
private int Increment(ref int p)
{
p++;
}
}
the return value of DoIt() will be 8. Since the location of t is on the stack, then that memory cannot be garbage collected or compacted and the reference variable in Increment() will always point to the proper contents of t.
However, suppose we have:
public class Test
{
private int t = 7;
public Test()
{
}
public int DoIt()
{
Increment(ref t);
return t;
}
private int Increment(ref int p)
{
p++;
}
}
Now, t is stored on the heap as it is a value of a specific instance of my class. Isn't this possibly a problem if I pass this value as a reference parameter? If I pass t as a reference parameter, p will point to the current location of t. However, if the garbage collector moves this object during a compact, won't that mess up the reference to t in Increment()? Or does the garbage collector update even references created by passing reference parameters? Do I have to worry about this at all? The only mention of worrying about memory being compacted on MSDN (that I can find) is in relation to passing managed references to unmanaged code. Hopefully that's because I don't have to worry about any managed references in managed code. :)
If I understand correctly, variables defined in a method are stored on the stack and are not affected by garbage collection.
It depends on what you mean by "affected". The variables on the stack are the roots of the garbage collector, so they surely affect garbage collection.
Since the location of t is on the stack, then that memory cannot be garbage collected or compacted and the reference variable in Increment() will always point to the proper contents of t.
"Cannot" is a strange word to use here. The point of using the stack in the first place is because the stack is only used for data which never needs to be compacted and whose lifetime is always known so it never needs to be garbage collected. That why we use the stack in the first place. You seem to be putting the cart before the horse here. Let me repeat that to make sure it is clear: the reason we store this stuff on the stack is because it does not need to be collected or compacted because its lifetime is known. If its lifetime were not known then it would go on the heap. For example, local variables in iterator blocks go on the heap for that reason.
Now, t is stored on the heap as it is a value of a specific instance of my class.
Correct.
Isn't this possibly a problem if I pass this value as a reference parameter?
Nope. That's fine.
If I pass t as a reference parameter, p will point to the current location of t.
Yep. Though the way I prefer to think of it is that p is an alias for the variable t.
However, if the garbage collector moves this object during a compact, won't that mess up the reference to t in Increment()?
Nope. The garbage collector knows about managed references; that's why they're called managed references. If the gc moves the thing around, the managed reference is still valid.
If you had passed an actual pointer to t using unsafe code then you would be required to pin the container of t in place so that the garbage collector would know to not move it. You can do that using the fixed statement in C#, or by creating a GCHandle to the object you want to pin.
does the garbage collector update even references created by passing reference parameters?
Yep. It would be rather fragile if it didn't.
Do I have to worry about this at all?
Nope. You're thinking about this like an unmanaged C++ programmer -- C++ makes you do this work, but C# does not. Remember, the whole point of the managed memory model is to free you from having to think about this stuff.
Of course, if you enjoy worrying about this stuff you can always use the "unsafe" feature to turn these safety systems off, and then you can write heap and stack corrupting bugs to your heart's content.
No, you don't need to worry about it. Basically the calling method (DoIt) has a "live" reference to the instance of Test, which will prevent it from being garbage collected. I'm not sure whether it can be compacted - but I suspect it can, with the GC able to spot which variable references are part of objects being moved.
In other words - don't worry. Whether it can be compacted or not, it shouldn't cause you a problem.
It is exactly how you mention it in the last sentence. The GC will move all needed references when it compacts the heap (except for references to unmanaged memory).
Note that using the stack or heap is related to an instance variable being of a value or reference type. Value types (structs and 'simple' types like int, double, etc) are always on the stack, classes are always in the heap (what is in the stack is the reference - the pointer - to the allocated memory for the instance).
Edit: as correctly noted below in the comment, the second paragraph was written much too quickly. If a value type instance is a member of a class, it will not be stored in the stack, it will be in the heap like the rest of the members.

Why are structs stored on the stack while classes get stored on the heap(.NET)?

I know that one of the differences between classes and structs is that struct instances get stored on stack and class instances(objects) are stored on the heap.
Since classes and structs are very similar. Does anybody know the difference for this particular distinction?
(edited to cover points in comments)
To emphasise: there are differences and similarities between value-types and reference-types, but those differences have nothing to do with stack vs heap, and everything to do with copy-semantics vs reference-semantics. In particular, if we do:
Foo first = new Foo { Bar = 123 };
Foo second = first;
Then are "first" and "second" talking about the same copy of Foo? or different copies? It just so happens that the stack is a convenient and efficient way of handling value-types as variables. But that is an implementation detail.
(end edit)
Re the whole "value types go on the stack" thing... - value types don't always go on the stack;
if they are fields on a class
if they are boxed
if they are "captured variables"
if they are in an iterator block
then they go on the heap (the last two are actually just exotic examples of the first)
i.e.
class Foo {
int i; // on the heap
}
static void Foo() {
int i = 0; // on the heap due to capture
// ...
Action act = delegate {Console.WriteLine(i);};
}
static IEnumerable<int> Foo() {
int i = 0; // on the heap to do iterator block
//
yield return i;
}
Additionally, Eric Lippert (as already noted) has an excellent blog entry on this subject
It's useful in practice to be able to allocate memory on the stack for some purposes, since those allocations are very fast.
However, it's worth noting that there's no fundamental guarantee that all structs will be placed on the stack. Eric Lippert recently wrote an interesting blog entry on this topic.
Every process has a data block consists of two different allocatable memory segment. These are stack and heap. Stack is mostly serving as the program flow manager and saves local variables, parameters and returning pointers (in a case of returning from the current working function).
Classes are very complex and mostly very large types compared to value types like structs (or basic types -- ints, chars, etc.) Since stack allocation should be specialized on the efficiency of program flow, it is not serving an optimal environment to keep large objects.
Therefore, to greet both of the expectations, this seperated architecture came along.
How the compiler and run-time environment handle memory management has grown up over a long period of time. The stack memory v.s. heap memory allocation decision had a lot to do with what could be known at compile-time and what could be known at runtime. This was before managed run times.
In general, the compiler has very good control of what's on the stack, it gets to decide what is cleaned up and when based on calling conventions. The heap on the other hand, was more like the wild west. The compiler did not have good control of when things came and went. By placing function arguments on the stack, the compiler is able to make a scope -- that scope can be controlled over the lifetime of the call. This is a natural place to put value types, because they are easy to control as opposed to reference types that can hand out memory locations (pointers) to just about anyone they want.
Modern memory management changes a lot of this. The .NET runtime can take control of reference types and the managed heap through complex garbage collection and memory management algorithms. This is also a very, very deep subject.
I recommend you check out some texts on compilers -- I grew up on Aho, so I recommend that. You can also learn a lot about the subject by reading Gosling.
In some languages, like C++, objects are also value types.
To find an example for the opposite is harder, but under classic Pascal union structs could only be instantiated on the heap. (normal structs could be static)
In short: this situation is a choice, not a hard law. Since C# (and Java before it) lack procedural underpinnings, one can ask themselves why it needs structures at all.
The reason it is there, is probably a combination of needing it for external interfaces and to have a performant and tight complex (container-) type. One that is faster than class. And then it is better to make it a value type.
Marc Gravell already explained wonderfully the difference regarding how value and reference types are copied which is the main differentiation between them.
As to why value types are usually created on the stack, that's because the way they are copied allows it. The stack has some definite advantages over the heap in terms of performance, particularly because the compiler can calculate the exact position of a variable created in a certain block of code, which makes access faster.
When you create a reference type you receive a reference to the actual object which exists in the heap. There is a small level of indirection whenever you interact with the object itself. These reference types cannot be created on the stack because the lifetime of values in the stack is determined, in great part, by the structure of your code. The function frame of a method call will be popped off the stack when the function returns, for example.
With value types, however, their copy semantics allows the compiler, depending on where it was created, to place it in the stack. If you create a local variable that holds an instance of a struct in a method and then return it, a copy of it will be created, as Marc explained above. This means that the value can be safely placed in the stack, since the lifetime of the actual instance is tied to the method's function frame. Anytime you send it somewhere outside the current function a copy of it will be created, so it doesn't matter if you tie the existence of the original instance to the scope of the function. Along these lines, you can also see why value types that are captured by closures need to go in the heap: They outlive their scope because they must also be accessible from within the closure, which can be passed around freely.
If it were a reference type, then you wouldn't be returning a copy of the object, but rather a reference, which means the actual value must be stored somewhere else, otherwise, if you returned the reference and the object's lifetime was tied to the scope in which it was created, it would end up pointing to an empty space in memory.
The distinction isn't really that "Value types go on the stack, reference types on the heap". The real point is that it's usually more efficient to access objects that live in the stack, so the compiler will try and place those values it can there. It simply turns out that value types, because of their copy semantics, fit the bill better than reference types.
I believe that whether or not to use stack or heap space is the main distinction between the two, perhaps this article will shed some light on your question: Csharp classes vs structs
The main difference being that the heap may hold objects that live forever while something on the stack is temporary in that it will disappear when the enclosing callsite is exited. This is because when one enters a method it grows to hold local variables as well as the caller method. When the method exits (ab)normally eg return or because of exception each frame must be popped off the stack. Eventually the interested frame is popped and everything on it lost.
The whole point about using the stack is that it automatically implements and honours scope. A variable stored on the stack exists until the functiont that created it exits and that functions stack frame is popped. Things that have local scope are natural for stack storage things that have bigger scope are more difficult to manage on the stack. Objects on the heap can have lifetimes that are controlled in more complex ways.
Compilers always use the stack for variables - value or reference it makes little difference. A reference variable doesn't have to have its value stored on the stack - it can be anywhere and the heap makes a more efficient if the object referenced is big and if there are multiple references to it. The point is that the scope of a reference variable isn't the same as the lifetime of the object it references i.e. a variable may be destroyed by being popped off the stack but the object (on the heap) it references might live on.
If a value type is small enough you might as well store it on the stack in place of a reference to it on the heap - its lifetime is tied to the scope of the variable. If the value type is part of a larger reference type then it too could have multiple references to it and hence it is more natural to store it on the heap and dissociate its lifetime from any single reference variable.
Stack and heap are about lifetimes and the value v reference semantics is almost a by product.
Have a look at Value and Reference
Value types go on the stack, reference types go on the heap. A struct is a value type.
There is no guaruantee about this in the specification though, so it might change in future releases:)

Categories

Resources