How to reduce small array garbage? - c#

I'm trying to reduce the amount of garbage generated by some algorithms that require small arrays to work efficiently. Also, using references in value types will decrease memory locality. Inlining the memory of the fixed array into a struct would improve cache efficiency.
The main goal is to keep the implementation readable, but to not generate any garbage by allocating an array.
I came up with a pattern that involves emulating an array with a struct, that trades garbage production with runtime performance:
ulong[] smallArray = new ulong[2]
becomes
struct ulong2
{
ulong l0;
ulong l1;
public ulong this[ulong n]
{
get
{
return (n == 0) ? l0 : l1;
}
set
{
if (n == 0)
{
l0 = value;
}
else
{
l1 = value;
}
}
}
}
ulong2 smallArray = new ulong2();
You'll need to refactor algorithms a little bit to cope with the by-value semantics instead of the original array by-ref, but otherwise, its a pretty good drop-in-and-forget replacement when you know the fixed size of the array.
How can this be optimized to keep O(1) read/write charactaristics of a linear array? Can it be optimized without using unsafe code? Are there other approaches?

Related

Is numbers comparisons as uint faster than normal int in c#

I was looking how some of the .Net core libraries were implemented and one of many things that caught my eye was that in Dictionary<TKey, TValue> class some numeric comparisons where done doing a casting to (uint) even though at my naive eyes this didn't impacted the logic.
For example on
do { // some magic } while (collisionCount <= (uint)entries.Length);
collisionCount was initialized at 0 and always incremented (collisionCount++) and thus entries being an array its length will not be negative either see source code
as opposed to
if ((uint)i >= (uint)entries.Length) { // some code }
source code line
where i could become negative in some occasions when doing the following, see debug img
i = entry.next;
and thus using it as positive would change the program flow (due to two's complement)
See an extract of the class code:
// Some code and black magic
uint hashCode = (uint)key.GetHashCode();
int i = GetBucket(hashCode);
Entry[]? entries = _entries;
uint collisionCount = 0;
if (typeof(TKey).IsValueType)
{
i--;
do
{
if ((uint)i >= (uint)entries.Length) // <--- Workflow impact
{
goto ReturnNotFound;
}
entry = ref entries[i];
if (entry.hashCode == hashCode && EqualityComparer<TKey>.Default.Equals(entry.key, key))
{
goto ReturnFound;
}
i = entry.next;
collisionCount++;
} while (collisionCount <= (uint)entries.Length);
}
// More cool stuffs
Is there any performace gain or what is the reason for this?
The linked Dictionary source contains this comment;
// Should be a while loop https://github.com/dotnet/runtime/issues/9422
// Test in if to drop range check for following array access
if ((uint)i >= (uint)entries.Length)
{
goto ReturnNotFound;
}
entry = ref entries[i];
The uint comparison here isn't faster, but it helps speed up the array access. The linked github issue talks about a limitation in the runtime compiler, and how this loop structure allows further optimisations. Since this uint comparison has been performed explicitly, the compiler can prove that 0 <= i < entries.Length. This allows the compiler to leave out the array boundary test and throw of IndexOutOfRangeException that would otherwise be required.
In other words, at the time this code was written, and performance profiling was performed. The compiler wasn't smart enough to make simpler, more readable code, run as fast as possible. So someone with a deep understanding of the limitations of the compiler tweaked the code to make it better.

struct array vs object array c#

I understand that mutable structs are evil. However, I'd still like to compare the performance of an array of structs vs an array of objects. This is what I have so far
public struct HelloStruct
{
public int[] hello1;
public int[] hello2;
public int hello3;
public int hello4;
public byte[] hello5;
public byte[] hello6;
public string hello7;
public string hello8;
public string hello9;
public SomeOtherStruct[] hello10;
}
public struct SomeOtherStruct
{
public int yoyo;
public int yiggityyo;
}
public class HelloClass
{
public int[] hello1;
public int[] hello2;
public int hello3;
public int hello4;
public byte[] hello5;
public byte[] hello6;
public string hello7;
public string hello8;
public string hello9;
public SomeOtherClass[] hello10;
}
public class SomeOtherClass
{
public int yoyo;
public int yiggityyo;
}
static void compareTimesClassVsStruct()
{
HelloStruct[] a = new HelloStruct[50];
for (int i = 0; i < a.Length; i++)
{
a[i] = default(HelloStruct);
}
HelloClass[] b = new HelloClass[50];
for (int i = 0; i < b.Length; i++)
{
b[i] = new HelloClass();
}
Console.WriteLine("Starting now");
var s1 = Stopwatch.StartNew();
for (int i = 0; i < _max; i++)
{
a[i % 50].hello1 = new int[] { 1, 2, 3, 4, i % 50 };
a[i % 50].hello3 = i;
a[i % 50].hello7 = (i % 100).ToString();
}
s1.Stop();
var s2 = Stopwatch.StartNew();
for (int j = 0; j < _max; j++)
{
b[j % 50].hello1 = new int[] { 1, 2, 3, 4, j % 50 };
b[j % 50].hello3 = j;
b[j % 50].hello7 = (j % 100).ToString();
}
s2.Stop();
Console.WriteLine(((double)(s1.Elapsed.TotalSeconds)));
Console.WriteLine(((double)(s2.Elapsed.TotalSeconds)));
Console.Read();
}
There's a couple of things happening here that I'd like to understand.
Firstly, since the array stores structs, when I try to access a struct from the array using the index operation, should I get a copy of the struct or a reference to the original struct? In this case when I inspect the array after running the code, I get the mutated struct values. Why is this so?
Secondly, when I compare the timings inside CompareTimesClassVsStruct() I get approximately the same time. What is the reason behind that? Is there any case under which using an array of structs or an array of objects would outperform the other?
Thanks
When you access the properties of an element of an array of structs, you are NOT operating on a copy of the struct - you are operating on the struct itself. (This is NOT true of a List<SomeStruct> where you will be operating on copies, and the code in your example wouldn't even compile.)
The reason you are seeing similar times is because the times are being distorted by the (j % 100).ToString() and new int[] { 1, 2, 3, 4, j % 50 }; within the loops. The amount of time taken by those two statements is dwarfing the times taken by the array element access.
I changed the test app a little, and I get times for accessing the struct array of 9.3s and the class array of 10s (for 1,000,000,000 loops), so the struct array is noticeably faster, but pretty insignificantly so.
One thing which can make struct arrays faster to iterate over is locality of reference. When iterating over a struct array, adjacent elements are adjacent in memory, which reduces the number of processor cache misses.
The elements of class arrays are not adjacent (although the references to the elements in the array are, of course), which can result in many more processor cache misses while you iterate over the array.
Another thing to be aware of is that the number of contiguous bytes in a struct array is effectively (number of elements) * (sizeof(element)), whereas the number of contiguous bytes in a class array is (number of elements) * (sizeof(reference)) where the size of a reference is 32 bits or 64 bits, depending on memory model.
This can be a problem with large arrays of large structs where the total size of the array would exceed 2^31 bytes.
Another difference you might see in speed is when passing large structs as parameters - obviously it will be much quicker to pass by value a copy of the reference to a reference type on the stack than to pass by value a copy of a large struct.
Finally, note that your sample struct is not very representative. It contains a lot of reference types, all of which will be stored somewhere on the heap, not in the array itself.
As a rule of thumb, structs should not be more than 32 bytes or so in size (the exact limit is a matter of debate), they should contain only primitive (blittable) types, and they should be immutable. And, usually, you shouldn't worry about making things structs anyway, unless you have a provable performance need for them.
Firstly, since the array stores structs, when I try to access a struct from the array using the index operation, should I get a copy of the struct or a reference to the original struct?
Let me tell you what is actually happening rather than answering your confusingly worded either-or question.
Arrays are a collection of variables.
The index operation when applied to an array produces a variable.
Mutating a field of a mutable struct successfully requires that you have in hand the variable that contains the struct you wish to mutate.
So now to your question: Should you get a reference to the struct?
Yes, in the sense that a variable refers to storage.
No, in the sense that the variable does not contain a reference to an object; the struct is not boxed.
No, in the sense that the variable is not a ref variable.
However, if you had called an instance method on the result of the indexer, then a ref variable would have been produced for you; that ref variable is called "this", and it would have been passed to your instance method.
You see how confusing this gets. Better to not think about references at all. Think about variables and values. Indexing an array produces a variable.
Now deduce what would have happened had you used a list rather than an array, knowing that the getter indexer of a list produces a value, not a variable.
In this case when I inspect the array after running the code, I get the mutated struct values. Why is this so?
You mutated a variable.
I get approximately the same time. What is the reason behind that?
The difference is so tiny that it is being swamped by all the memory allocations and memory copies you are doing in both cases. That is the real takeaway here. Are operations on mutable value types stored in arrays slightly faster? Possibly. (They save on collection pressure as well, which is often the more relevant performance metric.) But though the relative savings might be significant, the savings as a percentage of total work is often tiny. If you have a performance problem then you want to attack the most expensive thing, not something that is already cheap.

How to truncate an array in place in C#

I mean is it really possible? MSDN says that arrays are fixed-size and the only way to resize is "copy-to-new-place". But maybe it is possible with unsafe/some magic with internal CLR structures, they all are written in C++ where we have a full memory control and can call realloc and so on.
I have no code provided for this question, because I don't even know if it can exist.
I'm not talking about Array.Resize methods and so on, because they obviosly do not have needed behaviour.
Assume that we have a standard x86 process with 2GB ram, and I have 1.9GB filled by single array. Then I want to release half of it. So I want to write something like:
MagicClass.ResizeArray(ref arr, n)
And do not get OutOfMemoryException. Array.Resize will try to allocate another gigabyte of RAM and will fail with 1.9+1 > 2GB OutOfMemory.
You can try Array.Resize():
int[] myArray = new int[] { 1, 2, 3, 4 };
int myNewSize = 1;
Array.Resize(ref myArray, myNewSize);
// Test: 1
Console.Write(myArray.Length);
realloc will attempt to do the inplace resize - but it reserves the right to copy the whole thing elsewhere and return a pointer that's completely different.
Pretty much the same outward behaviour is exposed by .NET's List<T> class - which you should be using anyway if you find yourself changing array sizes often. It hides the actual array reference from you so that the change is propagated throughout all of the references to the same list. As you remove items from the end, only the length of the list changes while the inner array stays the same - avoiding the copying.
It doesn't release the memory (you can always do that explicitly with Capacity = XXX, but that makes a new copy of the array), but then again, unless you're working with large arrays, neither does realloc - and if you're working with large arrays, yada, yada - we've been there :)
realloc doesn't really make sense in the kind of memory model .NET has anyway - the heap is continously collected and compacted over time. So if you're trying to use it to avoid the copies when just trimming an array, while also keeping memory usage low... don't bother. At the next heap compaction, the whole memory above your array is going to be moved to fill in the blanks. Even if it were possible to do the realloc, the only benefit you have over simply copying the array is that you would keep your array in the old-living heap - and that isn't necessarily what you want anyway.
Neither array type in BCL supports what you want. That being said - you can implement your own type that would support what you need. It can be backed by standard array, but would implement own Length and indexer properties, that would 'hide' portion of array from you.
public class MyTruncatableArray<T>
{
private T[] _array;
private int _length;
public MyTruncatableArray(int size)
{
_array = new T[size];
_length = size;
}
public T this[int index]
{
get
{
CheckIndex(index, _length);
return _array[index];
}
set
{
CheckIndex(index, _length);
_array[index] = value;
}
}
public int Length
{
get { return _length; }
set
{
CheckIndex(value);
_length = value;
}
}
private void CheckIndex(int index)
{
this.CheckIndex(index, _array.Length);
}
private void CheckIndex(int index, int maxValue)
{
if (index < 0 || index > maxValue)
{
throw new ArgumentException("New array length must be positive and lower or equal to original size");
}
}
}
It really depend what exactly do need. (E.g. do you need to truncate just so that you can easier use it from your code. Or is perf/GC/memory consumption a concern? If the latter is the case - did you perform any measurements that proves standard Array.Resize method unusable for your case?)

Mutable struct vs. class?

I'm unsure about whether to use a mutable struct or a mutable class.
My program stores an array with a lot of objects.
I've noticed that using a class doubles the amount of memory needed. However, I want the objects to be mutable, and I've been told that using mutable structs is evil.
This is what my type looks like:
struct /* or class */ Block
{
public byte ID;
public bool HasMetaData; // not sure whether HasMetaData == false or
// MetaData == null is faster, might remove this
public BlockMetaData MetaData; // BlockMetaData is always a class reference
}
Allocating a large amount of objects like this (notice that both codes below are run 81 times):
// struct
Block[,,] blocks = new Block[16, 256, 16];
uses about 35 MiB of memory, whilst doing it like this:
// class
Block[,,] blocks = new Block[16, 256, 16];
for (int z = 0; z < 16; z++)
for (int y = 0; y < 256; y++)
for (int x = 0; x < 16; x++)
blocks[x, y, z] = new Block();
uses about 100 MiB of ram.
So to conclude, my question is as follows:
Should I use a struct or a class for my Block type? Instances should be mutable and store a few values plus one object reference.
First off, if you really want to save memory then don't be using a struct or a class.
byte[,,] blockTypes = new byte[16, 256, 16];
BlockMetaData[,,] blockMetadata = new BlockMetaData[16, 256, 16];
You want to tightly pack similar things together in memory. You never want to put a byte next to a reference in a struct if you can possibly avoid it; such a struct will waste three to seven bytes automatically. References have to be word-aligned in .NET.
Second, I'm assuming that you're building a voxel system here. There might be a better way to represent the voxels than a 3-d array, depending on their distribution. If you are going to be making a truly enormous number of these things then store them in an immutable octree. By using the persistence properties of the immutable octree you can make cubic structures with quadrillions of voxels in them so long as the universe you are representing is "clumpy". That is, there are large regions of similarity throughout the world. You trade somewhat larger O(lg n) time for accessing and changing elements, but you get to have way, way more elements to work with.
Third, "ID" is a really bad way to represent the concept of "type". When I see "ID" I assume that the number uniquely identifies the element, not describes it. Consider changing the name to something less confusing.
Fourth, how many of the elements have metadata? You can probably do far better than an array of references if the number of elements with metadata is small compared to the total number of elements. Consider a sparse array approach; sparse arrays are much more space efficient.
Do they really have to be mutable? You could always make it an immutable struct with methods to create a new value with one field different:
struct Block
{
// I'd definitely get rid of the HasMetaData
private readonly byte id;
private readonly BlockMetaData metaData;
public Block(byte id, BlockMetaData metaData)
{
this.id = id;
this.metaData = metaData;
}
public byte Id { get { return id; } }
public BlockMetaData MetaData { get { return metaData; } }
public Block WithId(byte newId)
{
return new Block(newId, metaData);
}
public Block WithMetaData(BlockMetaData newMetaData)
{
return new Block(id, newMetaData);
}
}
I'm still not sure whether I'd make it a struct, to be honest - but I'd try to make it immutable either way, I suspect.
What are your performance requirements in terms of both memory and speed? How close does an immutable class come to those requirements?
An array of structs will offer better storage efficiency than an array of immutable references to distinct class instances having the same fields, because the latter will require all of the memory required by the former, in addition to memory to manage the class instances and memory required to hold the references. All that having been said, your struct as designed has a very inefficient layout. If you're really concerned about space, and every item in fact needs to independently store a byte, a Boolean, and a class reference, your best bet may be to either have two arrays of byte (a byte is actually smaller than a Boolean) and an array of class references, or else have an array of bytes, an array with 1/32 as many elements of something like BitVector32, and an array of class references.

C#: ToArray performance [duplicate]

This question already has answers here:
Is it better to call ToList() or ToArray() in LINQ queries?
(16 answers)
Closed 9 years ago.
Background:
I admit I did not attempt to benchmark this, but I'm curious...
What are the CPU/memory characteristics of the Enumerable.ToArray<T> (and its cousin Enumerable.ToList<T>)?
Since IEnumerable does not advertise in advance how many elements it has, I (perhaps naively) presume ToArray would have to "guess" an initial array size, and then to resize/reallocate the array if the first guess appears to be too small, then to resize it yet again if the second guess appears to be too small etc... Which would give worse-than-linear performance.
I can imagine better approaches involving (hybrid) lists, but this would still require more than one allocation (though not reallocation) and quite bit of copying, though it could be linear overall despite the overhead.
Question:
Is there any "magic" taking place behind the scenes, that avoids the need for this repetitive resizing, and makes ToArray linear in space and time?
More generally, is there an "official" documentation on BCL performance characteristics?
No magic. Resizing happens if required.
Note that it is not always required. If the IEnumerable<T> being .ToArrayed also implements ICollection<T>, then the .Count property is used to pre-allocate the array (making the algorithm linear in space and time.) If not, however, the following (rough) code is executed:
foreach (TElement current in source)
{
if (array == null)
{
array = new TElement[4];
}
else
{
if (array.Length == num)
{
// Doubling happens *here*
TElement[] array2 = new TElement[checked(num * 2)];
Array.Copy(array, 0, array2, 0, num);
array = array2;
}
}
array[num] = current;
num++;
}
Note the doubling when the array fills.
Regardless, it's generally a good practice to avoid calling .ToArray() and .ToList() unless you absolute require it. Interrogating the query directly when needed is often a better choice.
I extracted the code behind .ToArray() method using .NET Reflector:
public static TSource[] ToArray<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
Buffer<TSource> buffer = new Buffer<TSource>(source);
return buffer.ToArray();
}
and Buffer.ToArray:
internal TElement[] ToArray()
{
if (this.count == 0)
{
return new TElement[0];
}
if (this.items.Length == this.count)
{
return this.items;
}
TElement[] destinationArray = new TElement[this.count];
Array.Copy(this.items, 0, destinationArray, 0, this.count);
return destinationArray;
}
And inside the Buffer constructor it loops through all elements to calculate the real Count and array of Elements.
IIRC, it uses a doubling algorithm.
Remember that for most types, all you need to store are references. It's not like you're allocating enough memory to copy the entire object (unless of course you're using a lot of structs... tsk tsk).
It's still a good idea to avoid using .ToArray() or .ToList() until the last possible moment. Most of the time you can just keep using IEnumerable<T> all the way up until you either run a foreach loop or assign it to a data source.

Categories

Resources