Reducing memory pressure with many small classes

Reducing memory pressure with many small classes - c#

I am struggling with memory problems in some legacy code.
The code performs various tasks with huge point clouds, all based around the following data structure:
public class Point
{
public double X { get; set; }
public double Y { get; set; }
public double Z { get; set; }
}
At any time, a couple millions of these points may hang around in memory in various lists. Many of these stay in memory long enough to go into generation 2 of the garbage collection. Since the program runs in 32bit mode, virtual address space is limited.
The programs using this legacy code sometimes crash with OutOfMemoryExceptions. Even if they do not crash, they consume far more memory than they should, and virtual address space is frequently fragmented to the point where no continuous chunk of memory larger than 50mb is available (e.g. MemoryFailPoint(50) fails). A couple of methods have explicit calls to GC.Collect(), and removing those increases the frequency of the crashes.
Now, I know of a two ways to solve this problem, both of which I cannot use:
Use a struct instead of a class for the points. Do not store those structs within a List, instead use arrays, to avoid copying the points with each access. Structs have far less overhead per instance than classes, and do not bother the garbage collector as much.
Unfortunately this would require huge breaking changes to the code; the existing methods all expect the point class to be mutable, and references to individual points are passed around everywhere. The copy-by-value semantics of structs will cause all sorts of problems.
Switch the whole app to 64bit. This would not reduce the memory, but would increase the virtual address space to a point where at least the app would not crash anymore.
Unfortunately there are a couple of legacy 32bit dlls that prevent this.
Is there any other way I could keep working with the existing Point class in 32bit, but reduce memory pressure and ease work for the garbage collector?
Can I somehow allocate and free all those points myself in unmanaged memory, while still passing references around in managed code?
Or is there another workaround I have missed?

In your comment, you already achieved significant improvement from 36 bytes down to 24 bytes by going from double to float:
I did some tests with float instead of double: I was able to save 33%
( the original point class is 24 byte + 12 byte overhead, with floats
it is 12 byte + 12 byte overhead.) With a struct I could save another
12 bytes for the class overhead. – HugoRune Sep 20 '16 at 16:19
You can get an additional 8 byte improvement by encapsulating a struct in your objects, and this achieves 16 bytes per Point (12 bytes for the floats with 4 bytes overhead), and saves an additional 22% for a total of 55%:
public struct PointFloat
{
public float X;
public float Y;
public float Z;
}
public class Point
{
private PointFloat dbls;
public float X
{
get { return dbls.X; }
set { dbls.X = value; }
}
public float Y
{
get { return dbls.Y; }
set { dbls.Y = value; }
}
public float Z
{
get { return dbls.Z; }
set { dbls.Z = value; }
}
}

Related

How to mitigate CPU to GPU bottleneck

I am using ComputeSharp library to run a compute shader on a very large set of data. The dataset is around 10GB, separated into smaller (about 3Gb) pieces for GPU to handle. The problem is that each piece takes about 1s to load, compute and return, even though the computation is almost instant.
I am looking for a way to speed this up, as now it gets outperformed by CPU in certain cases.
More details:
The dataset consists of custom points forming a point cloud. The shader is finding the points with highest values and using those to render an image. The max size of the point cloud will be about 500million points.
The points are already as small as they can be, saving all the metadata in a single int. Everything gets put in a buffer and passed to shader which spits out another buffer with result. I already tried and failed to use textures as they do not support custom types.
Edit (Minimal reproduction):
public struct DataPoint
{
public float3 Position;
public uint Value;
}
public void ComputeOneChunk(DataPoint[] dataPoints)
{
var stopWatch = new Stopwatch();
stopWatch.Start();
using var currentChunk = _gpu.AllocateReadOnlyBuffer(dataPoints);
stopWatch.Stop();
Debug.WriteLine($"Buffering took {stopWatch.ElapsedMilliseconds}ms");
stopWatch.Restart();
_gpu.For(dataPoints.Count, 1, new FindMax(
currentChunk,
resultBuffer));
stopWatch.Stop();
Debug.WriteLine($"Execution took {stopWatch.ElapsedMilliseconds}ms");
}
[AutoConstructor]
public readonly partial struct FindMaxForWell : IComputeShader
{
public readonly ReadOnlyBuffer<DataPoint> buffer;
public readonly ReadWriteBuffer<uint> resultBuffer;
public void Execute()
{
//DoStuff
}
}

Found a solution. It seems that converting dataPoints array to a buffer takes a long time and involves some intermediate steps. To speed this up you can use UploadBuffer which eliminates these extra steps.
_currentChunk = _gpu.AllocateReadOnlyBuffer<DataPoint>(uploadBuffer.Length);
_currentChunk.CopyFrom(uploadBuffer);
This changed 1s to around 300ms, which is probably as fast as it will ever go over PCIe.
Another problem appeared however, and that is that UploadBuffer uses Span and accessing it to save data is very slow, which I am still trying to solve.

Efficient small byte-arrays in C#

I have a huge collection of very small objects. To ensure the data is stored very compactly I rewrote the class to store all information within a byte-array with variable-byte encoding. Most instances of these millions of objects need only 3 to 7 bytes to store all the data.
After memory-profiling I found out that these byte-arrays always take at least 32 bytes.
Is there a way to store the information more compactly than bit-fiddled into a byte[]? Would it be better to point to an unmanaged array?
class MyClass
{
byte[] compressed;
public MyClass(IEnumerable<int> data)
{
compressed = compress(data);
}
private byte[] compress(IEnumerable<int> data)
{
// ...
}
private IEnumerable<int> decompress(byte[] compressedData)
{
// ...
}
public IEnumerable<int> Data { get { return decompress(compressed); } }
}

There are a couple problems you're facing that eat up memory. One is object overhead, and the other is objects aligning to 32 or 64 bit boundaries (depending on your build). Your current approach suffers from both issues. The following sources describe this in more detail:
Of Memory and Strings
How much memory does a C# string take up
I played around with this when I was fiddling with benchmarking sizes.
A solution that is simple would be to simply create a struct that has a single member that is a long value. Its methods would handle packing and unpacking bytes into and out of that long, using shift and mask bit fiddling.
Another idea would be a class that served up objects by ID, and stored the actual bytes in a single backing List<byte>. But this would get complicated and messy. I think the struct idea is much more straightforward.

Store data inside long number or class instance for better performance

I'm writing an AI for my puzzle game and I'm facing the following situation:
Currently, I have a class, Move, which represents a move in my game, which has similiar logic to chess.
In the Move class, I'm storing the following data:
The move player color.
The moving piece.
The origin position on the board.
The destination position on the board.
The piece that has been killed by performing this move (if any).
The move score.
In addition, I got some methods which describes amove, such as IsResigned, Undo etc.
This move instance is being passed along in my AI, which is based on the Alpha Beta algorithm. Therfore, the move instance is being passed many times, and I'm constructing many many Move class instances along the AI implementation. Thus, I'm afriad that it may have significant inpact of the performance.
To reduce the performance inpact, I thought about the following solution:
Instead of using instances of the Move class, I'll store my entire move data inside a long number (using bitwise operations), and then will extract the information as needed.
For instance:
- Player color will be from bit 1 - 2 (1 bit).
- Oirign position will be from bit 2 - 12 (10 bits).
and so on.
See this example:
public long GenerateMove(PlayerColor color, int origin, int destination) {
return ((int)color) | (origin << 10) | (destination << 20);
}
public PlayerColor GetColor(long move) {
return move & 0x1;
}
public int GetOrigin(long move) {
return (int)((move >> 10) & 0x3f);
}
public int GetDestination(long move) {
return (int)((move >> 20) & 0x3f);
}
Using this method, I can pass along the AI just long numbers, instead of class instances.
However, I got some wonders: Put aside the added complexity to the program, class instances are being passed in C# by reference (i.e. by sending a pointer to that address). So does my alternative method even make sense? The case is even worse, since I'm using long numbers here (64bis), but the pointer address may be represented as an integer (32bits) - so it may even have worest performance than my current implementation.
What is your opinion about this alternative method?

There are a couple of things to say here:
Are you actually having performance problems (and are you sure memory usage is the reason)? Memory allocation for new instances is very cheap in .net and normally, you will not notice garbage collection. So you might be barking up the wrong tree here.
When you pass an instance of a reference type, you are just passing a reference; when you store a reference type (e.g. in an array), you will just store the reference. So unless you create a lot of distinct instances or copy the data into new instances, passing the reference does not increase heap size. So passing references might be the most efficient way to go.
If you create a lot of copies and discard them quickly and you are afraid of memory impact (again, do you face actual problems?), you can create value types (structinstead of class). But you have to be aware of the value type semantics (you are always working on copies).
You can not rely on a reference being 32 bit. On a 64 bit system, it will be 64 bit.
I would strongly advise against storing the data in an integer variable. It makes your code less maintainable and that is in most of the cases not worth the performance tradeoff. Unless you are in severe trouble, don't do it.
If you don't want to give up on the idea of using a numeric value, use at least a struct, that is composed of two System.Collections.Specialized.BitVector32 instances. This is a built in .NET type that will do the mask and shift operations for you. In that struct you can also encapsulate accessing the values in properties, so you can keep this rather unusual way of storing your values away from your other code.
UPDATE:
I would recommend you use a profiler to see where the performance problems are. It is almost impossible (and defenitely not a good use of your time) to use guesswork for performance optimization. Once you see the profiler results, you'll probably be surprised about the reason of your problems. I would bet that memory usage or memory allocation is not it.
In case you actually come to the conclusion that memory consumption of your Move instances is the reason and using small value types would solve the problem (I'd be surprised), don't use an Int64, use a custom struct (as described in 6.) like the following, that will be the same size as an Int64:
[System.Runtime.InteropServices.StructLayout( System.Runtime.InteropServices.LayoutKind.Sequential, Pack = 4 )]
public struct Move {
private static readonly BitVector32.Section SEC_COLOR = BitVector32.CreateSection( 1 );
private static readonly BitVector32.Section SEC_ORIGIN = BitVector32.CreateSection( 63, SEC_COLOR );
private static readonly BitVector32.Section SEC_DESTINATION = BitVector32.CreateSection( 63, SEC_ORIGIN );
private BitVector32 low;
private BitVector32 high;
public PlayerColor Color {
get {
return (PlayerColor)low[ SEC_COLOR ];
}
set {
low[ SEC_COLOR ] = (int)value;
}
}
public int Origin {
get {
return low[ SEC_ORIGIN ];
}
set {
low[ SEC_ORIGIN ] = value;
}
}
public int Destination {
get {
return low[ SEC_DESTINATION ];
}
set {
low[ SEC_DESTINATION ] = value;
}
}
}
But be aware that you are now using a value type, so you have to use it accordingly. That means assignments create copies of the original (i.e. changing the destination value will leave the source unchanged), using ref parameters if you want to persist changes made by subroutines and avoid boxing at any cost to prevent even worse performance (some operations can mean boxing even though you won't immediately notice, e.g. passing the struct that implements an interface as an argument of the interface type). Using structs (just as well as using Int64) will only be worth it when you create a lot of temporary values, which you quickly throw away. And then you'll still need to confirm with a profile that your performance is actually improved.

.NET, get memory used to hold struct instance

It's possible to determine memory usage (according to Jon Skeet's blog)
like this :
public class Program
{
private static void Main()
{
var before = GC.GetTotalMemory(true);
var point = new Point(1, 0);
var after = GC.GetTotalMemory(true);
Console.WriteLine("Memory used: {0} bytes", after - before);
}
#region Nested type: Point
private class Point
{
public int X;
public int Y;
public Point(int x, int y)
{
X = x;
Y = y;
}
}
#endregion
}
It prints Memory used: 16 bytes (I'm running x64 machine).
Consider we change Point declaration from class to struct. How then to determine memory used? Is is possible at all? I was unable to find anything about getting stack size in .NET
P.S
Yes, when changed to 'struct', Point instances will often be stored on Stack(not always), instead of Heap.Sorry for not posting it first time together with the question.
P.P.S
This situation has no practical usage at all(IMHO), It's just interesting for me whether it is possible to get Stack(short term storage) size. I was unable to find any info about it, so asked you, SO experts).

You won't see a change in GetTotalMemory if you create the struct the way you did, since it's going to be part of the thread's stack, and not allocated separately. The GetTotalMemory call will still work, and show you the total allocation size of your process, but the struct will not cause new memory to be allocated.
You can use sizeof(Type) or Marshal.SizeOf to return the size of a struct (in this case, 8 bytes).

There is special CPU register, ESP, that contains pointer to the top of the stack. Probably you can find a way to read this register from .Net (using some unsafe or external code). Then just compare value of this pointer at given moment with value at thread start - and difference between them will be more or less acurate amount of memory, used for thread's stack. Not sure if it really works, just an idea :)

In isolation, as you have done here, you might have a "reasonable" amount of success with this methodology. I am not confident the information is useful, but running this methodology, especially if you run it numerous times to ensure you did not have any other piece of code or GC action affecting the outcome. Utilizing this methodology in a real world application is less likely to give accurate results, however, as there are too many variables.
But realize, this only "reasonable" and not a surety.
Why do you need to know the size of objects? Just curious, as knowing the business reason may lead to other alternatives.

C# basics - Memory Management

I am new to the programming in C#.
Can anyone please tell me memory management about C#?
Class Student
{
int Id;
String Name;
Double Marks;
public string getStudentName()
{
return this.Name;
}
public double getPersantage()
{
return this.Marks * 100 / 500;
}
}
I want to know how much memory is allocated for instance of this class?
What about methods? Where they are allocated?
And if there are static methods, what about their storage?
Can anyone please briefly explain this to me?

An instance of the class itself will take up 24 bytes on a 32-bit CLR:
8 bytes of object overhead (sync block and type pointer)
4 bytes for the int
4 bytes for the string reference
8 bytes for the double
Note that the memory for the string itself is in addition to that - but many objects could share references to the same string, for example.
Methods don't incur the same sort of storage penalty is fields. Essentially they're associated with the type rather than an instance of the type, but there's the IL version and the JIT-compiled code to consider. However, usually you can ignore this in my experience. You'd have to have a large amount of code and very few instances for the memory taken up by the code to be significant compared with the data. The important thing is that you don't get a separate copy of each method for each instance.
EDIT: Note that you happened to pick a relatively easy case. In situations where you've got fields of logically smaller sizes (e.g. short or byte fields) the CLR chooses how to lay out the object in memory, such that values which require memory alignment (being on a word boundary) are laid out appropriately, but possibly backing other ones - so 4 byte fields could end up taking 4 bytes, or they could take 16 if the CLR decides to align each of them separately. I think that's implementation-specific, but it's possible that the CLI spec dictates the exact approach taken.

As, I think Jon Skeet is saying, it depends on a lot of factors, and not easily measurable ahead of time. Factors such as whether it's running on a 64 bit OS or 32 bit OS must be taken into account, and whether you are running a debug or release version come into play. The amount of memory taken up by code depends on the processor that the JITTER compiles to, as different optimizations can be used for different processors.

Not really answer, just for fun.
struct Student
{
int Id;
[MarshalAs(UnmanagedType.LPStr)]
String Name;
Double Marks;
public string getStudentName()
{
return this.Name;
}
public double getPersantage()
{
return this.Marks * 100 / 500;
}
}
And
Console.WriteLine(Marshal.SizeOf(typeof(Student)));
On 64bit return:
24
And on 32 bit:
16

sizeof(getPersantage());
a good way to find out bytes for it. not too havent done much C#, but better with an answer than no answer :=)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Reducing memory pressure with many small classes - c#

Related

How to mitigate CPU to GPU bottleneck

Efficient small byte-arrays in C#

Store data inside long number or class instance for better performance

.NET, get memory used to hold struct instance

C# basics - Memory Management

Categories

Resources