Why dealing with Value is faster than with Reference for Struct?

Why dealing with Value is faster than with Reference for Struct? - c#

It maybe unclear for me but, when I read the msdn doc and I try to understand deeply Struct behaviour.
From msdn
Dealing with Stack :
This will yield performance gains.
and :
Whenever you have a need for a type that will be used often and is
mostly just a piece of data, structs might be a good option.
I don't understand because, I guess when I pass a Struct in parameter of a method the "copy value" process must be slower than "copy reference" process?

The cost of passing a struct is proportional to its size. If the struct is smaller than a reference or the same size as a reference then passing its value will have the same cost as passing a reference.
If not, then you are correct; copying the struct might be more expensive than copying the reference. That's why the design guidelines say to keep a struct small.
(Note that when you call a method on a struct, the "this" is actually passed as a reference to the variable that contains the struct value; that's how you can write a mutable struct.)
There are potential performance gains when using structs, but as you correctly point out, there are potential performance losses as well. Structs are cheap (in both memory and time) to allocate and cheap to deallocate (in time), and cheap to copy if they are small. References are slightly more expensive in both memory and time to allocate, more expensive to deallocate, and cheap to copy. If you have a large number of small structs -- say, a million Point structs -- then it will be cheaper to allocate and deallocate an array with a million structs in it than an array with a million references to a million instances of a Point class.
But if the struct is big, then all that additional copying might be more expensive than the benefit you get from the more efficient allocation and deallocation. You have to look at the whole picture when doing performance analysis; don't make the "struct vs class" decision on the basis of performance without empirical data to back up that decision.
There is much misinformation on the internet, in our own documentation, and in many books, about how memory management works behind the scenes in C#. If you are interested in learning what is myth and what is reality, I recommend reading my series of articles on the subject. Start from the bottom:
http://blogs.msdn.com/b/ericlippert/archive/tags/memory+management/

Another recommendation for structs is that they should be small; not larger than 16 bytes. That way they can be copied with a single instruction, or just a few instructions.
Copying a reasonably small amount of data will be almost as fast as copying a reference, and then it will be faster for the method to access the data as there is no redirection needed.
If the struct is smaller than a pointer (i.e. 32 or 64 bits), it will even be faster to copy the value than to copy a reference.
Even if a structure is a bit larger than a reference, there is still some overhead involved with creating objects. Each object has some overhead and has to be allocated as a separate memory block. A byte as a value type takes up just a single byte, but if you box the byte as an object, it will take up 16 or 24 bytes on the heap, plus another 4 or 8 bytes for the reference.
Anyhow, a decision to use a struct or a class should normally be about what kind of data they represent, and not just about performance. A structure works well for data that represent a single entity, so that you can treat it as a single value.

It is true about what you said regarding copy process as in copying a reference takes lesser time than copying a struct as all struct and references are stored on the stack. But the reason why msdn suggests using struct would give a performance gain is the time takes to access the stack and the heap.
If you need a type that contains mostly static data and does is not huge (meaning it does not contain huge arrays, multi dimensional or otherwise, of value types) then it would be wiser to use struct rather than reference types as the access for stack is much lower than the managed heap.
Along with that, the time taken for allocation and deallocation, or you can say in short, the management of heap is somewhat time consuming as compared to the stack.
You can have a better understanding of this topic here and here as this has been explained in detail.
Hope it helps.

Related

Cache friendliness in C#

I've just seen a very interesting talk by Herb Sutter at the Build Conference 2014, called "Modern C++: What You Need to Know". Here's a link to the video of the talk: http://channel9.msdn.com/Events/Build/2014/2-661
One of the topics of the talk was on how std::vector is very cache-friendly, mainly because it ensures that its elements are adjacent in the heap, which has a great impact on spatial locality, or at least that's what I think I've understood; this is the case even for inserting and removing items. Clever use of std::vector can bring dramatic performance improvements by exploiting caching.
I'd like to try something like what with C#/.Net, but how to ensure that the objects in my collections are all adjacent in memory?
Any other pointer to resources on cache-friendliness on C# and .Net is also appreciated. :)

For GC-managed languages, you tend to lose that explicit control over where every object is stored in memory. You can control memory access patterns but that becomes a bit futile if you can't control what memory addresses you are accessing. On top of that, every object tends to come with an unavoidable overhead of an indirection and something resembling a virtual pointer (under the hood) to allow dynamic dispatch, reflection, etc. It's the analogical equivalent of having to store pointers (references) to objects and having to work indirectly with them with each object instance also storing the analogical pointer to allow for runtime type information like reflection and what's needed for virtual dispatch.
As a result, even if you store an array of object references contiguously, that's only making the analogical pointers to the objects cache-friendly to access sequentially. The contents of each object could still be scattered in memory, incurring cache misses galore as you load memory regions into cache lines only to use one object's worth of data before that data gets evicted.
It's actually the number one appeal of a language like C++ to me: it lets you still use OOP while controlling where everything is going to be allocated in memory as well as exactly how much memory everything uses and it lets you opt out (actually by default) from the overhead associated with virtual dispatch, RTTI, etc. while still using objects and generics and so forth. Meanwhile the biggest appeal to me personally of languages like C# and Java is what you can get with each object like reflection, which comes with overhead per object, but is a justified cost if your code has plenty of use for it.
Using plain old data types (struct included in C#):
That said, I've seen very efficient code written in C# and Java on par with C and C++ but the key difference is that they avoided objects for the tiny fraction of code that is really performance-critical. For example, I saw an interactive Java raytracer using single-path tracing with very impressive speed given the brute force nature of what it was doing. However, the key difference is that while most of the raytracer was written using nice object-oriented code, for the performance-critical parts (BVH, mesh representation, and triangles stored in the leaves) it avoided objects and just used big arrays of int[] and float[]. That performance-critical code was pretty "raw", maybe even more "raw" than equally-optimized C++ code (it looked closer to C or Fortran), but it was only necessary for a few key areas of the raytracer.
When you use arrays of plain old data types for the performance-critical areas, you gain the sufficient control over memory to make all the difference, because it doesn't matter so much if the array is GC-managed and maybe occasionally moved from one memory location to another after a GC cycle (ex: out of Eden space after a first GC cycle). It doesn't matter because the array is moved as a whole. As a result the element at index 1 is still right next to elements 0 and elements 2. That's all that matters for cache-friendly sequential processing of an array is that each element in the array is right next to the other in memory, and even in Java and C#, as long as you're using arrays of PODs (which include structs in C# the last time I checked), you have that level of control.
So when you're writing performance-critical code, make sure your designs leave enough breathing room to change the representation of how things are stored, and possibly away from objects if the design becomes bottlenecky in the future. As a basic example, for an Image object (which is effectively a collection of pixels), you might avoid storing pixels as individual objects and you definitely don't want to expose an abstract Pixel object for clients to use directly. Instead you might represent a collection of pixels as an array of plain old integers or floats behind an Image interface, and a single image might represent a million pixels. That will allow cache-friendly sequential processing of the image.
Avoid using new for a boatload of teeny things.
Don't use new excessively for teeny things to put it simply. Allocate in bulk for performance-critical areas: one new for an entire array of a million integers representing a million pixels in an image, e.g., not a million calls to new to allocate one pixel at a time to locations in memory outside of your control. Besides cache-friendliness, if you allocate each pixel as a separate object in C#, the memory overhead required to store the analogical pointer for dynamic dispatch and reflection and so forth will often be bigger than the entire pixel itself, doubling or tripling the memory use of a single pixel.
Design for bulk, homogeneous processing in performance-critical areas.
If you're writing a video game revolving around OOP and inheritance rather than ECS and duck typing, the classic inheritance example is often too granular: Dog inherits Mammal, Cat inherits Mammal. Instead if you are going to be dealing with a boatload of mammals in your game loops every frame, I recommend instead to make Cats inherit Mammals, Dogs inherit Mammals. Mammals becomes an abstract container rather than something that just represents one mammal at a time. This will give your designs enough breathing room to process, say, many dogs worth of contiguous primitive data at one time very efficiently when you are trying to process an abstract Dogs which inherits abstract Mammals when you try to do things indirectly to dogs with polymorphism through the Mammals interface.
This advice actually applies regardless of whether you're using C or C++ or Java or C# or anything else. To write cache-friendly code, you have to start with designs that leave enough breathing room to optimize their data representations and access patterns as needed in the future, and ideally with a profiler in hand. The worst-case scenario is to end up with a design that has accumulated many dependencies which is bottlenecky, like many dependencies to a Pixel object or IPixel interface, which is too granular to optimize further without rewriting and redesigning the entire software. So avoid mass dependencies to overly granular designs that leave no breathing room to optimize further. Redirect the dependencies away from the analogical Pixel to the analogical Image, from the analogical Mammal to the analogical Mammals, and you'll be able to optimize to your heart's content without costly redesigns.

It seems like the only way to achieve this in C# is to use value types, that means using structs instead of classes. Using List will then store your objects in contiguous memory. You can read some more info on structs vs classes here: Choosing Between Class and Struct

You can use List, which stores items contiguously (like a std::vector).
See this answer for more info.

Int / Int64 .Net Memory Allocations

I have a large application which averages about 30 mb/sec in memory allocations (per performance monitor bytes allocated/sec measurement). I am trying to cut this down substantially, and the source of the allocations is not obvious.
To instrument things I have recorded my ETW traces for the CLR / GC, and have exported the AllocationTick event, which records every time an additional 100 kilobytes is allocated, and what the object type was that was most recently allocated. This produces a nice size sample set. Three object types account for 70% of my allocations, but they are a bit of a mystery.
System.Int64 30%
System.Int32 28%
System.Runtime.CompilerServices.CallSite'1[System.Func'3[System.Runtime.CompilerServices.CallSite,System.Object,System.Object]] 12%
The dataset was ~70 minutes and a million events, so I am quite confident in the numbers.
I am guessing this is somehow indicating that I am creating a lot of pointers on the heap in some unexpected way? (this is an x64 application)
I use some linq and foreach loops, but these should only be creating increment variables on the stack, not the heap.
I am running everything on top of the TPL / Dataflow library as well, which could be generating these.
I am looking for any advice on what may be causing so many heap allocations of int32/64, and perhaps some techniques to isolate these allocations (call stacks would be great, but may be performance prohibitive).

I am guessing this is somehow indicating that I am creating a lot of pointers on the heap in some unexpected way?
It sounds more likely that you're boxing a lot of int and long values to me.
The CallSite part sounds like you're using dynamic a lot (or in one very heavily-used part of the code), which can easily lead to more boxing than statically typed code.
I would try to isolate particular areas of the code which allocate a lot of your objects - if you can exercise just specific code paths, for example, that would give you a much clearer idea of which of those paths generates more garbage than you'd expect. Have a look at anywhere that uses dynamic and see whether you actually need to - although you shouldn't feel you have to remove all uses of dynamic by any means; there may well be one particuar "hot spot" that could be micro-optimized.
The other thing to consider is how much this allocation is actually costing you. You say you're trying to cut down on it substantially - do you really need to? If all of these objects are very short-lived, you may well find that you don't improve performance much by reducing the allocation rate. You should measure time spent in garbage collection to work out how effective this is likely to be.

Memory usage in .NET when creating a new class or struct

Int has a size of 4 bytes, if I create a new Int in my program will incresse its memory consumption by 4 bytes. Right?
But if I have this class
public class Dummy{
private int;
}
How much memory will my new class use? Will be memory consumption be lower if it was a struct? I think that the reference itself will also consume some memory.

A single reference either takes 4 bytes on 32-bit processes or 8 bytes on 64-bit processes. A reference is a standard overhead on classes (as they are reference types). Structs do not incur references (well, ignoring any potential boxing) and are the size of their content usually. I cannot remember if classes have any more overhead, don't think so.
This question delves into class vs struct (also provided in the question comments):
Does using "new" on a struct allocate it on the heap or stack?
As stated in the comments, only instances of a class will consume this reference overhead and only when there is a reference somewhere. When there are no references, the item becomes eligible for GC - I'm not sure what the size of a class is on the heap without any references, I would presume it is the size of its content.
Really, classes don't have a true "size" that you can rely on. And most importantly this shouldn't really be the deciding factor on using classes or structs (but you tend to find guidelines stating that types at or below roughly 16 bytes can be suitable structs, and above tends towards classes). For me the deciding factor is intended usage.
When talking about structs, I feel obliged to provide the following link: Why are mutable structs “evil”?

A class is a reference type and is located a the heap ( and will be remove there from the garbabe collector). A struct ist value type and is stored on the stack.
In case of your example Microsoft recommends a value type (struct), because a reference type causes too much overhead.
If you're interested in this topic then take a look into the book "CLR via C#" from Jeffrey Richter.

Local variables vs instance variables

I've been doing a lot of research on C# optimization for a game that I'm building with XNA, and I still don't quite understand whether local variables are instance variables give better performance when constantly being updated and used.
According to http://www.dotnetperls.com/optimization , you should avoid parameters and local variables, meaning instance variables are the best option in terms of performance.
But a while ago, I read on another StackOverflow post (I can't seem to find where it was) that local variables are stored in a part of memory that is far quicker to access, and that every time an instance variable is set, the previous value has to be erased as a tedious extra step before a new value can be assigned.
I know that design-wise, it might break encapsulation to use instance variables in that kind of situation, but I'm strictly curious about performance. Currently in my game, I pass around local variables to 3 out of 7 methods in a class, but I could easily promote the variables to instance variables and be able to entirely avoid parameter passing and local variables.
So which would be better?

Are your variables reference (class, or string) or value (struct) types?
For reference types there's no meaningful difference between passing them as a method argument and holding them on an object instance. In the first case when entering the function the argument will (for functions with a small argument count) end up in a register. In the second case the reference exists as an offset of the data pointed to in memory by 'this'. Either scenario is a quick grab of a memory address and then fetching the associated data out of memory (this is the expensive part).
For value types the above is true for certain types (integers or floats that can fit in your CPU's registers). For those specific things it's probably a little cheaper to pass-by-value vs. extracting them off 'this'. For other value types (DateTime or structs you might make yourself or any struct with multiple members) when the data is going to be too large to pass in through a register so this no longer matters.
It's pretty unlikely, though, that any of this matters for the performance of your application (even a game). Most common .NET performance problems (that are not simply inefficient algorithms) are going to, in some form, come from garbage generation. This can manifest itself through accidental boxing, poor use of string building, or poor object lifetime management (your objects have lifespans that are neither very short nor very long/permanent).

Personally, I wouldn't be looking at this as the culprit for performance issues (unless you are constantly passing large structs). My naive understanding is that GC pressure is the usual consideration with XNA games, so being frugal with your object instances basically.
If the variable is method-local, the value itself or the reference (when a reference type) will be located on the stack. If you promote those to class member variables they will be located in the class's memory on the heap.
Method calls would technically become faster as you are no longer copying references or values on the call (because presumably you can remove the parameters from the method if the method is also local to the class).
I'm not sure about the relative performance, but to me it seems that if you need to persist the value then the value makes some sense being in the class...
To me it seems like any potential gains from doing one in favour of the other is outweighed by the subtle differences between the two - making them roughly equivalent or so small a difference as to not care.
Of course, all this stands to be corrected in the face of hard numbers from performance profiling.

Not passing arguments will be slightly faster, not initialising local objects (if they are objects) will be faster.
What you read in both articles is not contradictory, one is mentioning that passing argument costs time and the other mention that initialising objects (in local objects) can cost time as well.
About allocating new objects, one thing you can do is to reuse objects rather discarding them. For example, some time ago, I had to write some code for traders which would compute the price of a few products in real time in C/C++ and C#. I obtained a major boost of performance by not re-creating the system of equation from scratch but only by merely updating the system incrementally from the previous system.
This avoided allocating memory for the new objects, initialising new objects and often the system would be nearly the same so I would have only to modify tiny bits to update it.
Usually, before to do any optimisation you want to make sure that you are optimising something that will impact significantly the overall performance ?

Large Arrays, and LOH Fragmentation. What is the accepted convention?

I have an other active question HERE regarding some hopeless memory issues that possibly involve LOH Fragmentation among possibly other unknowns.
What my question now is, what is the accepted way of doing things?
If my app needs to be done in Visual C#, and needs to deal with large arrays to the tune of int[4000000], how can I not be doomed by the garbage collector's refusal to deal with the LOH?
It would seem that I am forced to make any large arrays global, and never use the word "new" around any of them. So, I'm left with ungraceful global arrays with "maxindex" variables instead of neatly sized arrays that get passed around by functions.
I've always been told that this was bad practice. What alternative is there?
Is there some kind of function to the tune of System.GC.CollectLOH("Seriously") ?
Are there possibly some way to outsource garbage collection to something other than System.GC?
Anyway, what are the generally accepted rules for dealing with large (>85Kb) variables?

Firstly, the garbage collector does collect the LOH, so do not be immediately scared by its prescence. The LOH gets collected when generation 2 gets collected.
The difference is that the LOH does not get compacted, which means that if you have an object in there that has a long lifetime then you will effectively be splitting the LOH into two sections — the area before and the area after this object. If this behaviour continues to happen then you could end up with the situation where the space between long-lived objects is not sufficiently large for subsequent assignments and .NET has to allocate more and more memory in order to place your large objects, i.e. the LOH gets fragmented.
Now, having said that, the LOH can shrink in size if the area at its end is completely free of live objects, so the only problem is if you leave objects in there for a long time (e.g. the duration of the application).
Starting from .NET 4.5.1, LOH could be compacted, see GCSettings.LargeObjectHeapCompactionMode property.
Strategies to avoid LOH fragmentation are:
Avoid creating large objects that hang around. Basically this just means large arrays, or objects which wrap large arrays (such as the MemoryStream which wraps a byte array), as nothing else is that big (components of complex objects are stored separately on the heap so are rarely very big). Also watch out for large dictionaries and lists as these use an array internally.
Watch out for double arrays — the threshold for these going into the LOH is much, much smaller — I can't remember the exact figure but its only a few thousand.
If you need a MemoryStream, considering making a chunked version that backs onto a number of smaller arrays rather than one huge array. You could also make custom version of the IList and IDictionary which using chunking to avoid stuff ending up in the LOH in the first place.
Avoid very long Remoting calls, as Remoting makes heavy use of MemoryStreams which can fragment the LOH during the length of the call.
Watch out for string interning — for some reason these are stored as pages on the LOH and can cause serious fragmentation if your application continues to encounter new strings to intern, i.e. avoid using string.Intern unless the set of strings is known to be finite and the full set is encountered early on in the application's life. (See my earlier question.)
Use Son of Strike to see what exactly is using the LOH memory. Again see this question for details on how to do this.
Consider pooling large arrays.
Edit: the LOH threshold for double arrays appears to be 8k.

It's an old question, but I figure it doesn't hurt to update answers with changes introduced in .NET. It is now possible to defragment the Large Object Heap. Clearly the first choice should be to make sure the best design choices were made, but it is nice to have this option now.
https://msdn.microsoft.com/en-us/library/xe0c2357(v=vs.110).aspx
"Starting with the .NET Framework 4.5.1, you can compact the large object heap (LOH) by setting the GCSettings.LargeObjectHeapCompactionMode property to GCLargeObjectHeapCompactionMode.CompactOnce before calling the Collect method, as the following example illustrates."
GCSettings can be found in the System.Runtime namespace
GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect();

The first thing that comes to mind is to split the array up into smaller ones, so they don't reach the memory needed for the GC to put in it the LOH. You could spit the arrays into smaller ones of say 10,000, and build an object which would know which array to look in based on the indexer you pass.
Now I haven't seen the code, but I would also question why you need an array that large. I would potentially look at refactoring the code so all of that information doesn't need to be stored in memory at once.

You get it wrong. You do NOT need to havean array size 4000000 and you definitely do not need to call the garbace collector.
Write your own IList implementation. Like "PagedList"
Store items in arrays of 65536 elements.
Create an array of arrays to hold the pages.
This allows you to access basically all your elements with ONE redirection only. And, as the individual arrays are smaller, fragmentation is not an issue...
...if it is... then REUSE pages. Dont throw them away on dispose, put them on a static "PageList" and pull them from there first. All this can be transparently done within your class.
The really good thing is that this List is pretty dynamic in the memory usage. You may want to resize the holder array (the redirector). Even when not, it is about 512kbdata per page only.
Second level arrays have basically 64k per byte - which is 8 byte for a class (512kb per page, 256kb on 32 bit), or 64kb per struct byte.
Technically:
Turn
int[]
into
int[][]
Decide whether 32 or 64 bit is better as you want ;) Both ahve advantages and disadvantages.
Dealing with ONE large array like that is unwieldely in any langauge - if you ahve to, then... basically.... allocate at program start and never recreate. Only solution.

This is an old question, but with .NET Standard 1.1 (.NET Core, .NET Framework 4.5.1+) there is another possible solution:
Using ArrayPool<T> in the System.Buffers package, we can pool arrays to avoid this problem.

Am adding an elaboration to the answer above, in terms of how the issue can arise. Fragmentation of the LOH is not only dependent on the objects being long lived, but if you've got the situation that there are multiple threads and each of them are creating big lists going onto the LOH then you could have the situation that the first thread needs to grow its List but the next contiguous bit of memory is already taken up by a List from a second thread, hence the runtime will allocate new memory for the first threads List - leaving behind a rather big hole. This is whats happening currently on one project I've inherited and so even though the LOH is approx 4.5 MB, the runtime has got a total of 117MB free memory but the largest free memory segment is 28MB.
Another way this could happen without multiple threads, is if you've got more than one list being added to in some kind of loop and as each expands beyond the memory initially allocated to it then each leapfrogs the other as they grow beyond their allocated spaces.
A useful link is: https://www.simple-talk.com/dotnet/.net-framework/the-dangers-of-the-large-object-heap/
Still looking for a solution for this, one option may be to use some kind of pooled objects and request from the pool when doing the work. If you're dealing with large arrays then another option is to develop a custom collection e.g. a collection of collections, so that you don't have just one huge list but break it up into smaller lists each of which avoid the LOH.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.