Large Object Heap Fragmentation, Issues with arrays

Large Object Heap Fragmentation, Issues with arrays - c#

I am writing an analysis application in C# that has to deal with a lot of memory. I use ANTS Memory Profiler 7.4 for optimizing my memory management. While doing so, I realized that all of my double[,] arrays I use (and i need them) are placed on the LOH although the largest of these arrays is about 24.000 bytes. Objects should not be put there before 85.000 bytes as far as I know. The problem is now, that since I have about several thousand instances of these double[,] arrays I have a lot of memory fragmentation (about 25% of my total memory usage is free memory that I can not use). some of these arrays stored on the LOH are even only 1.036 bytes in size. The problem is that sometimes I have to perform larger analysis and then I end up with an out of memory exception because of the massive memory loss due to LOH fragmentation.
Does anyone know why this is happening although it should not be a large object by definition?

The threshold size for putting arrays of doubles on the LOH is much lower than for other types. The reason for this is that items on the LOH are always 64-bit aligned, and doubles benefit greatly from being 64-bit aligned.
Note that this only affects programs running in 32 bits. Programs running in 64 bits have objects that are always aligned on a 64-bit boundary, so that LOH heuristic is not used for 64 bit programs.
The threshold size is 1000 doubles.
Also see https://connect.microsoft.com/VisualStudio/feedback/details/266330/

Related

Large object heap waste

I noticed that my application runs out of memory quicker than it should. It creates many byte arrays of several megabytes each. However when I looked at memory usage with vmmap, it appears .NET allocates much more than needed for each buffer. To be precise, when allocating a buffer of 9 megabytes, .NET creates a heap of 16 megabytes. The remaining 7 megabytes cannot be used to create another buffer of 9 megabytes, so .NET creates another 16 megabytes. So each 9MB buffer wastes 7MB of address space!
Here's a sample program that throws an OutOfMemoryException after allocating 106 buffers in 32-bit .NET 4:
using System.Collections.Generic;
namespace CSharpMemoryAllocationTest
{
class Program
{
static void Main(string[] args)
{
var buffers = new List<byte[]>();
for (int i = 0; i < 130; ++i)
{
buffers.Add(new byte[9 * 1000 * 1024]);
}
}
}
}
Note that you can increase the size of the array to 16 * 1000 * 1024 and still allocate the same amount of buffers before running out of memory.
VMMap shows this:
Also note how there's an almost 100% difference between the total Size of the Managed Heap and the total Commited size. (1737MB vs 946MB).
Is there a reliable way around this problem on .NET, i.e. can I coerce the runtime into allocating no more than what I actually need, or maybe much larger Managed Heaps that can be used for several contiguous buffers?

Internally the CLR allocates memory in segments. From your description it sounds like the 16 MB allocations are segments and your arrays are allocated within these. The remaining space is reserved and not really wasted under normal circumstances, as it will be used for other allocations. If you don't have any allocation that will fit within the remaining chunks these are essentially overhead.
As your arrays are allocated using contiguous memory you can only fit a single of those within a segment and hence the overhead in this case.
The default segment size is 16 MB, but if you allocation is larger than that the CLR will allocate segments that are larger. I'm not aware of the details, but e.g. if I allocate 20 MB Wmmap shows me 24 MB segments.
One way to reduce the overhead is to make allocations that fit with the segment sizes if possible. But keep in mind that these are implementation details and could change with any update of the CLR.

The CLR reserving a 16MB chunk in one go from the OS, but only actively occupying 9MB.
I believe you are expecting the 9MB and 9MB to be in one heap. The difficulty is that the variable is now split over 2 heaps.
Heap 1 = 9MB + 7MB
Heap 2 = 2MB
The problem we have now, is if the original 9MB is deleted, we now have 2 heaps we can't tidy up, as the contents are shared across heaps.
To improve performance, the approach is to put them in single heaps.
If you are worried about memory usage, don't be. Memory usage is not a bad thing with .NET, as it if no-one is using it, what's the problem? The GC will at some point kick in, and memory will be tidied up. GC will only kick in either
When the CLR deems it necessary
When the OS tells the CLR to give back memory
When forced to by the code
But memory usage, especially in this example shouldn't be a concern. The memory usage stops CPU cycles occurring. Otherwise if it tidied up memory constantly, your CPU would be high, and your process (and all others on your machine) would run much slower.

Age old symptom of the buddy system heap management algorithm, where powers of 2 are used to split each block recursively, in a binary tree, so for 9M the next size is 16M. If you dropped your array size down to 8mb, you will see your usage drop by half. Not a new problem, native programmers deal with it too.
The small object pool (less than 85,000 bytes) is managed differently, but at 9MB your arrays are in the large object pool. As of .NET 4.5, the large object heap doesn't participate in compaction, large objects are immediately promoted to generation 2.
You can't coerce the algorithm, but you can certainly coerce your user code by figuring out what sizes to use that will most efficiently fill the binary segments.
If you need to fill your process space with 9 MB arrays, either:
Figure out how to save 1MB to reduce the arrays to 8MB segments
Write or use a segmented array class that abstracts an array of 1 or 2MB array segments, using an indexer property. Same way you build an unlimited bitfield or a growable ArrayList. Actually, I thought one of the containers already did this.
Move to 64-bit
Reclaiming fragmented portion of a buddy system heap is an optimization with logarithmic returns (ie. you are approximately running out of memory anyway). At some point you'll have to move to 64-bit whether its convenient or not, unless your data size is fixed.

are reads and writes to a variable of type double guaranteed to be atomic on a 64 bit intel processor?

I have a 64 bit machine running a 64 bit processor but my application is 32 bit. Is reading or writing to a double guaranteed to be atomic? I am talking about assignment and reading only. How does a 32 bit process affect this process? Can the reader ever read partial value from a double when other thread is writing to it in given specs?

No. A double can easily straddle the L1 cache line boundary, requiring multiple bus cycles to glue the two pieces together. A very real problem in C#, the 32-bit CLR only provides an alignment guarantee of 4.
These misaligned accesses are not only not atomic, they are also expensive due to the shuffling the processor needs to do. Code that uses double can have 3 distinct timings, fast if the double happens to be aligned to 8 by accident, twice as slow when it is misaligned to 4 but still inside an L1 cache line, over three times slower when it is misaligned across a cache line. Such a program can randomly bump into one of these modes when the garbage collector compacts the heap and moves the double. Be very wary of this, a synthetic test program can easily miss this failure mode.
The perf problem is the core reason that a double[] is allocated in the Large Object Heap much sooner normal. The LOH provides an alignment to 8 guarantee. The normal rule is for objects >= 85,000 bytes, for double[] in 32-bit mode it is 8,000 bytes (1000 elements in the array).
You must use Interlocked.Exchange() if you need the atomicity guarantee. And I ought to post the disclaimer that atomicity is a very weak guarantee and no substitute for proper locking when requried.

Passing large data structures to unmanaged code using fixed pointer

I'm developing an application which consists of two parts:
C# front-end
C++ number cruncher
In some cases the amount of data passed from C# to C++ can be really large. I'm talking about Gb and maybe more. There's a large array of doubles in particular and I wanted to pass a pinning/fixed pointer to this array to C++ code. The number crunching can take up to several hours to finish. I'm worrying about any problems that can be triggered by this usage of pinning pointers. As I see it, the garbage collector will not be able to touch this large memory region for a long time. Can this cause any problems? Should I consider a different strategy?
I thought that instead of passing the whole array I could provide an interface for building this array from within C++ code, so that the memory is owned by unmanaged part of the application. But in the end both strategies will create a large chunk of memory which is not relocatable for C# garbage collector for a long time. Am I missing something?

You don't have a problem. Large arrays are allocated in the Large Object Heap. Pinning them can not have any detrimental effect, the LOH is not compacted. "Large" here means an array of doubles with 1000 or more elements for 32-bit code or any array equal or larger than 85,000 bytes.

For your specific use case, it may be worthwhile to use a memory mapped file as the shared memory buffer between your c# and c++ code. This circumvents the garbage collector altogether. And also lets the OS cache pager deal with memory pressure issues, instead of GC managed memory.

Array Allocation in .NET

The question is regarding the allocation of arrays in .net. i have a sample program below in which maximum array i can get is to length. I increase length to +1 it gives outofMemory exception. But If I keep the length and remove the comments I am able to allocation 2 different big arrays. Both arrays are less the .net permissible object size of 2 gb and total memory also less than virtual Memory. Can someone put any thoughts?
class Program
{
static int length = 203423225;
static double[] d = new double[length];
//static int[] i = new int[15000000];
static void Main(string[] args)
{
Console.WriteLine((sizeof(double)*(double)length)/(1024*1024));
Console.WriteLine(d.Length);
//Console.WriteLine(i.Length);
Console.WriteLine(Process.GetCurrentProcess().VirtualMemorySize64.ToString());
}
}

A 32-bit process must allocate virtual memory for the array from the address space it has available. by default 2 gigabytes. Which contains a mix of both code and data. Allocations are made from the holes between the existing allocations.
Such allocations always fail not because there is no more virtual memory left, they fail because the available holes are not big enough. And you asking for a big hole, getting 1.6 jiggabytes is very rare and will only work on very simple programs that don't load any additional DLLs. A poorly based DLL is a good way to cut a large hole in two, drastically reducing the odds of such an allocation succeeding. The more typical works-on-the-first-try allocation is around 650 megabytes. The second allocation didn't fail because there was another hole available. The odds go down considerably after a program has been running for a while and the address space got fragmented. A 90 MB allocation can fail.
You can get insight in how the virtual memory address space is carved up for a program with the SysInternals' VMMap utility.
A simple workaround is to set the EXE project's Platform target setting to AnyCPU and run the program on a 64-bit operating system. It will have gobs of addressable virtual memory space available, you'll only be limited by the maximum allowed size of the paging file and the .NET 2 gigabyte object size limit. A limitation that's addressed in .NET 4.5 with the new <gcAllowVeryLargeObjects> config element. Even a 32-bit program can take advantage of the available 4 gigabyte 32-bit address space on a 64-bit operating system with the /LARGEADDRESSAWARE option of editbin.exe, you'll have to run it in a post-build event.

This will be because when allocating memory for arrays the memory must be contiguous (i.e. the array must be allocated as one large block of memory). Even if there is enough space in total to allocate the array, if the free address space is split up then the memory allocation will still fail unless the largest of those free spaces is large enough for the entire array.

.net collections memory optimization - will this method work?

Just like almost any other big .NET application, my current C# project contains many .net collections .
Sometimes I don't know, from the beginning, what the size of a Collection (List/ObservableCollection/Dictionary/etc.) is going to be.
But there are many times when I do know what it is going to be.
I often get an OutOfMemoryException and I've been told it can happen not only because process size limits, but also because of fragmentation.
So my question is this - will setting collection's size (using the capacity argument in the constructor) every time I know its expected size help me prevent at least some of the fragmentation problems ?
This quote is from the msdn :
If the size of the collection can be
estimated, specifying the initial
capacity eliminates the need to
perform a number of resizing
operations while adding elements to
the List.
But still, I don't want to start changing big parts of my code for something that might not be the real problem.
Has it ever helped any of you to solve out of memory problems ?

Specifying an initial size will rarely if ever get rid of an OutOfMemory issue - unless your collection size is millions of object in which case you should really not keep such a collection.
Resizing a collection involves defining a completely new array with a new additional size and then copying the memory. If you are already close to out of memory, yes, this can cause an out of memory since the new array cannot be allocated.
However, 99 out of 100, you have a memory leak in your app and collection resizing issues is only a symptom of it.

If you are hitting OOM, then you may be being overly aggressive with the data, but to answer the question:
Yes, this may help some - as if it has to keep growing the collections by doubling, it could end up allocating and copying twice as much memory for the underlying array (or more precicely, for the earlier smaller copies that are discarded). Most of these intermediate arrays will be collected promptly, but when they get big you are using the "large object heap", which is harder to compact.
Starting with the correct size prevents all the intermediate copies of the array.
However, it also depends what is in the array matters. Typically, for classes, there is more data in each object (plus overheads for references etc) - meaning the list is not necessarily the biggest culprit for memory use; you might be burning up most of the memory on objects.
Note that x64 will allow more overall space, but arrays are limited to 2GB - and if each reference doubles in size this halves the maximum effective length of the array.
Personally I would look at breaking the huge sets into smaller chains of lists; jagged lists, for example.

.NET has a compating garbage collector, so you probably won't run into fragmentation problems on the normal .NET heap. You can however get memory fragmentation if you're using lots of unmanaged memory (e.g. through GDI+, COM, etc.). Also, the large object heap isn't compacted, so that can get fragmented, too. IIRC an object is put into the LOH if it's bigger than 80kb. So if you have many collections that contain more than 20k objects, you might get fragmentation problems.
But instead of guessing where the problem might be, it might be better to narrow the problem down some more: When do you get the OutOfMemoryExceptions? How much memory is the application using at that time? Using a tool like WinDbg or memory profilers you should be able to find out how much of that memory is on the LOH.
That said, it's always a good idea to set the capacity of List and other data structures in advance if you know it. Otherwise, the List will double it's capacity everytime you add an item and hit the capacity limit which means lots of unnecessary allocation and copy operations.

In order to solve this, you have to understand the basics and pinpoint the problem in your code.
It is always a good idea to set the initial capacity, if you have a sensible estimate. If you only have an approximate guess, allocate more.
Fragmentation can only occur on the LOH (objects over 80 kB). To prevent it , try to allocate blocks of the same size. Paradoxically, the solution might be to sometimes allocate more memory than you actually need.

The answer is that, yes pre-defining a size on collections will increase performance and memory optimization and reduce fragmentation. See my answer here to see why - If I set the initial size of a .NET collection and then add some items OVER this initial size, how does the collection determine the next resize?
However, without analyzing a memory dump or memory profiling on the app, it's impossible to say exactly what the cause of the OOM is. Thus, impossible to conjecture if this optimization will solve the problem.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.