Is c# compiler deciding to use stackalloc by itself? - c#

I found a blog entry which suggests that sometimes c# compiler may decide to put array on the stack instead of the heap:
Improving Performance Through Stack Allocation (.NET Memory Management: Part 2)
This guy claims that:
The compiler will also sometimes decide to put things on the stack on its own. I did an experiment with TestStruct2 in which I allocated it both an unsafe and normal context. In the unsafe context the array was put on the heap, but in the normal context when I looked into memory the array had actually been allocated on the stack.
Can someone confirm that?
I was trying to repeat his example, but everytime I tried array was allocated on the heap.
If c# compiler can do such trick without using 'unsafe' keyword I'm specially intrested in it. I have a code that is working on many small byte arrays (8-10 bytes long) and so using heap for each new byte[...] is a waste of time and memory (especially that each object on heap has 8 bytes overhead needed for garbage collector).
EDIT: I just want to describe why it's important to me:
I'm writing library that is communicating with Gemalto.NET smart card which can have .net code working in it. When I call a method that returns something, smart card return 8 bytes that describes me the exact Type of return value. This 8 bytes are calculated by using md5 hash and some byte arrays concatenations.
Problem is that when I have an array that is not known to me I must scan all types in all assemblies loaded in application and for each I must calculate those 8 bytes until I find the same array.
I don't know other way to find the type, so I'm trying to speed it up as much as possible.

Author of the linked-to article here.
It seems impossible to force stack allocation outside of an unsafe context. This is likely the case to prevent some classes of stack overflow condition.
Instead, I recommend using a memory recycler class which would allocate byte arrays as needed but also allow you to "turn them in" afterward for reuse. It's as simple as keeping a stack of unused byte arrays and, when the list is empty, allocating new ones.
Stack<Byte[]> _byteStack = new Stack<Byte[]>();
Byte[] AllocateArray()
{
Byte[] outArray;
if (_byteStack.Count > 0)
outArray = _byteStack.Pop();
else
outArray = new Byte[8];
return outArray;
}
void RecycleArray(Byte[] inArray)
{
_byteStack.Push(inArray);
}
If you are trying to match a hash with a type it seems the best idea would be to use a Dictionary for fast lookups. In this case you could load all relevant types at startup, if this causes program startup to become too slow you might want to consider caching them the first time each type is used.

From your line:
I have a code that is working on many small byte arrays (8-10 bytes long)
Personally, I'd be more interested in allocating a spare buffer somewhere that different parts of your code can re-use (while processing the same block). Then you don't have any creation/GC to worry about. In most cases (where the buffer is used for very discreet operations) with a scratch-buffer, you can even always assume that it is "all yours" - i.e. every method that needs it can assume that they can start writing at zero.
I use this single-buffer approach in some binary serialization code (while encoding data); it is a big boost to performance. In my case, I pass a "context" object between the layers of serialization (that encapsulates the scratch-buffer, the output-stream (with some additional local buffering), and a few other oddities).

System.Array (the class representing an array) is a reference type and lives on the heap. You can only have an array on the stack if you use unsafe code.
I can't see where it says otherwise in the article that you refer to. If you want to have a stack allocated array, you can do something like this:
decimal* stackAllocatedDecimals = stackalloc decimal[4];
Personally I wouldn't bother- how much performance do you think you will gain by this approach?
This CodeProject article might be useful to you though.

Related

Rapid stack allocations vs accessing a single heap allocation

I'm having a situation where I have an array T[] which must be copied in an instant and sent over to a function accepting a ReadOnlySpan<T>. I found two solutions on this problem. However I'm interested in the one which gives the better performance.
Considerations:
Array number of elements range between 1 to 3 (so it's extremely small);
T is a readonly struct of 16 bytes managed size
I create another array T[] globally (which will be heap allocated) and then use the .CopyTo() extension method on the first array and then pass down the second array
I create a Span<T> locally using stackalloc (which will be stack allocated) and then use .CopyTo<T>() extension just like the previous version.
The difference is that the second approach requires me to do this every time the function is called, whereas the first approach the array is already initialized before the function is called even the first time.
Which approach do you guys think it's the better one ?
Ok so I ran some benchmark dotnet tests. The results were almost 99% the same (taking about speed) with a small in favor of the stackalloc version, but only if I had used in combination with the [SkipLocalsInit] attribute. Also total allocations were higher for this version. So to summarize I came to the conclusion that this is just a micro-optimization and both variants are fine as long as you keep an eye on the stack size, otherwise first version is 10x better.
And not in the end, if you thought ReadOnlySpan<T> is always pass by value, think again when looking at my example

C# Array access vs C++ PInvoke pointer access

I've got an idea of optimising a large jagged array. Let's say i got in c# array
struct BlockData
{
internal short type;
internal short health;
internal short x;
internal short y;
internal short z;
internal byte connection;
}
BlockData[][][] blocks = null;
byte[] GetBlockTypes()
{
if (blocks == null)
blocks = InitializeJaggedArray<BlockData[][][]>(256, 64, 256);
//BlockData is struct
MemoryStream stream = new MemoryStream();
for (int x = 0; x < blocks.Length; x++)
{
for (int y = 0; y < blocks[x].Length; y++)
{
for (int z = 0; z < block[x][y].Length; z++)
{
stream.WriteByte(blocks[x][y][z].type);
}
}
}
return stream.ToArray();
}
Would storing the Blocks as a BlockData***in C++ Dll and then using PInvoke to read/write them be more efficient than storing them in C# arrays?
Note. I'm unable to perform tests right now because my computer is right now at service.
This sounds like a question where you should first read the speed rant, starting at part 2: https://ericlippert.com/2012/12/17/performance-rant/
This is such a miniscule difference - if it matters you are probably in a realtime scenario. And .NET is the wrong choice for realtime scenarios to begin with. If you are in a realtime scenario, this is not going to be the only thing you have to wear off GC Memory Management and security checks.
It is true that accessing a array in Native C++ is faster then acessing it in .NET. .NET has the indexers as proper function calls, similar to properties. And .NET does verify in the Index is valid. However, it is not as bad as you might think. The optimisations are pretty good. Function calls can be inlined. Array access will be pruned with a temporary variable if possible. And even the array check is not save from sensible removal. So it is not as big a advantage as you might think.
As others pointed out, P/Invoke will consume any gains there might be, with it's overhead. But actually going into a different environment is unnecessary:
The thing is, you can also use naked pointers in .NET. You have to enable it with unsafe code, but it is there. You can then acquire a piece of unmanaged memory and treat it like a array in native C++. Of course that subjects to to mistakes like messing up the pointer arithmetic or overflow - the exact reasons those checks exist in the first place!
Would storing the Blocks as a BlockData***in C++ Dll and then using PInvoke to read/write them be more efficient than storing them in C# arrays?
No, because P/Invoke has a significant overhead, whereas array access in C# .NET is compiled at runtime by the JIT to fairly efficient code with bounds-checks. Jagged-arrays in .NET also have adequate performance (the only weak-area in .NET is true multidimensional arrays, which is disappointing - but I don't believe your proposal would help that either).
Update: Multidimensional array performance in .NET Core actually seems worse than .NET Framework (if I'm reading this thread correctly).
Another way to look at it - GC and overall maintanance. Your proposal is essentially the same as allocated one big array and using (layer * layerSize + row * rowSize + column) for indexing it. PInvoke will give you following drawbacks:
you likely endup with unmanaged allocation for the array. This make GC unaware of large amount of allocated memory and you need to make sure to notify GC about it.
PInvoked calls can't be completely inlined unlike all .Net code during JIT
you need to maintain code in two languages
PInvoke is not as portable - requires platform/bitness specific libraries to deal with and add a lot of fun when sharing your program.
and one possible gain:
removing boundary checks performed by .Net on arrays
Back of a napkin calculation shows that at best both will balance out in raw performance. I'd go with .Net-only version as it is easier to maintain, less fun with GC.
Additionally when you hide chunk auto-generation/partially generated chunks behind index method of the chunk it is easier to write code in a single language... In reality the fact that fully populated chunks are very memory consuming your main issue would likely be memory usage/memory access cost rather than raw performance of iterating through elements. Try and measure...

Why most of the data structures in generic collections use array despite of Large Object Heap fragmentation?

I could see that CoreCLR and CoreFx implicitly use array for most of the generic collections. what is the main driving factor to go with arrays and how it handles any side effects of LOH fragmentation.
What other then arrays should collections be?
More importnatly, what other then arrays could collections be?
In use collection boils down to "arrays - and stuff we wrap around arrays, for ease of use.":
The pure thing (arrays), wich do offer some conveniences like bounds checks in C#/.NET
Self growing arrays (Lists)
Two synchronized arrays that allow the mapping of any any input to any element (Dictionaries key/value pair)
Three synchornized array: Key, Value and a Hashvalue to quickly identify not-matching keys (HastTable).
Below the hood - regardless of how hard .NET makes it to use pointers - it all boils down to some code doing C/C++ style pointer arythmethic to get the next element.
Edit 1: As I learned in another place, .NET Dictionaries are actually implemented as HashLists. The HashList class is just the pre-generics version. Object has a GetHashCode function with sensible default behavior wich can be used, but also fully overwritten.
Fragmentation wise the "best" would be a array of references. It can be as small as the reference width (a Pointer or slightly bigger) and the GC can move around the instances to defragment memory. Of course then you get the slight overhead of accessing references rather the just counting/mathing up a pointer, so as usualy it is a memory vs speed tradeoff. However this might go into Speed Rant Territory of detail.
Edit 2: As Markus Appel pointed out in the comments, there is something even better for fragmentation avoidance: Linked lists. Even that single array of references - if you just make it big enough - will take quite some memory in one indivisible chunk. So it might run into object size limits or array indexer limits. A linked list will do neither. But as a result the performance is around a disk that was never defragmented.
Generics is just a convience to have typesafety in collections/other places. It avoids you having to use the dreaded Object as type, wich ruins all compile-time typesafety. Afaik they add nothing else to this situation. List<string> works the same as a StringList would.
Array access is faster as it is a linear storage. If Arrays can solve a problem well enough they are a better storage for traversal rather than always identifying where the next object is stored. For Large data structures this performance benefit will also be amplified.
Using arrays can cause fragmentation if used carelessly. In the general case though, the performance gains outweigh the cost.
When the buffer runs out, the collection allocates a new one with double the size. If the code inserts a lot of items without specifying a capacity, this results in log2(N) reallocations. If the code does specify a capacity though, even a very rough approximation, there may be no fragmentation issues at all.
Removal is another expensive case as the collection will have to move the items after the deleted item(s) to the left.
In general though, array storage offers far better performance than other storage structures though, both for reading, inserting and allocating memory. Deletions are rare in most cases.
For example, inserting N items in a linked list requires allocating N objects to hold that value and storing N pointers. That cost will be paid for every insertion, while the GC will have a lot more objects to track and collect. Inserting 100K items in a linked list would allocate 100K node objects that would need tracking.
With an array there won't be any allocations unless the buffer runs out. In the majority of cases insertion means simply writing to a buffer location and updating a count. When the buffer runs out there will be a single reallocation and an (expensive) copy operation. For 100K items, that's 17 allocations. In most cases, that's an acceptable cost.
To reduce or even get rid of allocations, the code can specify a capacity that's used as the initial buffer size. Specifying even a very rough estimate can reduce allocations a lot. Specifying 1024 as the initial capacity for 100K items would reduce reallocations to 7.

Where these are stored?

I am learning GC on .net. I would like to know, where are my integers, floats or value types, static variable stored, Member of the functions, value types in the function are stored.
Any documents or any weblink you have on this topics, please post it here.
Thank you,
Harsha
I have an article which talks about this a bit, but you should really read various blog posts by Eric Lippert. "The truth about value types" is probably the most important one, along with "The stack is an implementation detail" (part one; part two).
Fundamentally it's more important to understand garbage collection in terms of reachability etc, rather than the implementation details of what goes where in memory. That can be helpful in terms of performance, but you need to keep reminding yourself that it's an implementation detail.
Note: Jon Skeet's Answer is more Correct
Stack memory:
The stack is the section of memory that is allocated for automatic variables within functions.
Data is stored in stack using the Last In First Out (LIFO) method. This means that storage in the memory is allocated and deallocated at only one end of the memory called the top of the stack. Stack is a section of memory and its associated registers that is used for temporary storage of information in which the most recently stored item is the first to be retrieved.
Heap memory
On the other hand, heap is an area of memory used for dynamic memory allocation. Blocks of memory are allocated and freed in this case in an arbitrary order. The pattern of allocation and size of blocks is not known until run time. Heap is usually being used by a program for many different purposes.
The stack is much faster than the heap but also smaller and more expensive.
Example: (Its for C though not C#)
int x; /* static stack storage /
main() {
int y; / dynamic stack storage /
char str; / dynamic stack storage /
str = malloc(50); / allocates 50 bytes of dynamic heap storage /
size = calcSize(10); / dynamic heap storage */
Above content Taken from Here

C# List<T>.ToArray performance is bad?

I'm using .Net 3.5 (C#) and I've heard the performance of C# List<T>.ToArray is "bad", since it memory copies for all elements to form a new array. Is that true?
No that's not true. Performance is good since all it does is memory copy all elements (*) to form a new array.
Of course it depends on what you define as "good" or "bad" performance.
(*) references for reference types, values for value types.
EDIT
In response to your comment, using Reflector is a good way to check the implementation (see below). Or just think for a couple of minutes about how you would implement it, and take it on trust that Microsoft's engineers won't come up with a worse solution.
public T[] ToArray()
{
T[] destinationArray = new T[this._size];
Array.Copy(this._items, 0, destinationArray, 0, this._size);
return destinationArray;
}
Of course, "good" or "bad" performance only has a meaning relative to some alternative. If in your specific case, there is an alternative technique to achieve your goal that is measurably faster, then you can consider performance to be "bad". If there is no such alternative, then performance is "good" (or "good enough").
EDIT 2
In response to the comment: "No re-construction of objects?" :
No reconstruction for reference types. For value types the values are copied, which could loosely be described as reconstruction.
Reasons to call ToArray()
If the returned value is not meant to be modified, returning it as an array makes that fact a bit clearer.
If the caller is expected to perform many non-sequential accesses to the data, there can be a performance benefit to an array over a List<>.
If you know you will need to pass the returned value to a third-party function that expects an array.
Compatibility with calling functions that need to work with .NET version 1 or 1.1. These versions don't have the List<> type (or any generic types, for that matter).
Reasons not to call ToArray()
If the caller ever does need to add or remove elements, a List<> is absolutely required.
The performance benefits are not necessarily guaranteed, especially if the caller is accessing the data in a sequential fashion. There is also the additional step of converting from List<> to array, which takes processing time.
The caller can always convert the list to an array themselves.
taken from here
Yes, it's true that it does a memory copy of all elements. Is it a performance problem? That depends on your performance requirements.
A List contains an array internally to hold all the elements. The array grows if the capacity is no longer sufficient for the list. Any time that happens, the list will copy all elements into a new array. That happens all the time, and for most people that is no performance problem.
E.g. a list with a default constructor starts at capacity 16, and when you .Add() the 17th element, it creates a new array of size 32, copies the 16 old values and adds the 17th.
The size difference is also the reason why ToArray() returns a new array instance instead of passing the private reference.
This is what Microsoft's official documentation says about List.ToArray's time complexity
The elements are copied using Array.Copy, which is an O(n) operation, where n is Count.
Then, looking at Array.Copy, we see that it is usually not cloning the data but instead using references:
If sourceArray and destinationArray are both reference-type arrays or are both arrays of type Object, a shallow copy is performed. A shallow copy of an Array is a new Array containing references to the same elements as the original Array. The elements themselves or anything referenced by the elements are not copied. In contrast, a deep copy of an Array copies the elements and everything directly or indirectly referenced by the elements.
So in conclusion, this is a pretty efficient way of getting an array from a list.
it creates new references in an array, but that's just the only thing that that method could and should do...
Performance has to be understood in relative terms. Converting an array to a List involves copying the array, and the cost of that will depend on the size of the array. But you have to compare that cost to other other things your program is doing. How did you obtain the information to put into the array in the first place? If it was by reading from the disk, or a network connection, or a database, then an array copy in memory is very unlikely to make a detectable difference to the time taken.
For any kind of List/ICollection where it knows the length, it can allocate an array of exactly the right size from the start.
T[] destinationArray = new T[this._size];
Array.Copy(this._items, 0, destinationArray, 0, this._size);
return destinationArray;
If your source type is IEnumerable (not a List/Collection) then the source is:
items = new TElement[4];
..
if (no more space) {
TElement[] newItems = new TElement[checked(count * 2)];
Array.Copy(items, 0, newItems, 0, count);
items = newItems;
It starts at size 4 and grows exponentially, doubling each time it runs out of space. Each time it doubles, it has to reallocate memory and copy the data over.
If we know the source-data size, we can avoid this slight overhead. However in most cases eg array size <=1024, it will execute so quickly, that we don't even need to think about this implementation detail.
References: Enumerable.cs, List.cs (F12ing into them), Joe's answer

Categories

Resources