Large Arrays, and LOH Fragmentation. What is the accepted convention?

Large Arrays, and LOH Fragmentation. What is the accepted convention? - c#

I have an other active question HERE regarding some hopeless memory issues that possibly involve LOH Fragmentation among possibly other unknowns.
What my question now is, what is the accepted way of doing things?
If my app needs to be done in Visual C#, and needs to deal with large arrays to the tune of int[4000000], how can I not be doomed by the garbage collector's refusal to deal with the LOH?
It would seem that I am forced to make any large arrays global, and never use the word "new" around any of them. So, I'm left with ungraceful global arrays with "maxindex" variables instead of neatly sized arrays that get passed around by functions.
I've always been told that this was bad practice. What alternative is there?
Is there some kind of function to the tune of System.GC.CollectLOH("Seriously") ?
Are there possibly some way to outsource garbage collection to something other than System.GC?
Anyway, what are the generally accepted rules for dealing with large (>85Kb) variables?

Firstly, the garbage collector does collect the LOH, so do not be immediately scared by its prescence. The LOH gets collected when generation 2 gets collected.
The difference is that the LOH does not get compacted, which means that if you have an object in there that has a long lifetime then you will effectively be splitting the LOH into two sections — the area before and the area after this object. If this behaviour continues to happen then you could end up with the situation where the space between long-lived objects is not sufficiently large for subsequent assignments and .NET has to allocate more and more memory in order to place your large objects, i.e. the LOH gets fragmented.
Now, having said that, the LOH can shrink in size if the area at its end is completely free of live objects, so the only problem is if you leave objects in there for a long time (e.g. the duration of the application).
Starting from .NET 4.5.1, LOH could be compacted, see GCSettings.LargeObjectHeapCompactionMode property.
Strategies to avoid LOH fragmentation are:
Avoid creating large objects that hang around. Basically this just means large arrays, or objects which wrap large arrays (such as the MemoryStream which wraps a byte array), as nothing else is that big (components of complex objects are stored separately on the heap so are rarely very big). Also watch out for large dictionaries and lists as these use an array internally.
Watch out for double arrays — the threshold for these going into the LOH is much, much smaller — I can't remember the exact figure but its only a few thousand.
If you need a MemoryStream, considering making a chunked version that backs onto a number of smaller arrays rather than one huge array. You could also make custom version of the IList and IDictionary which using chunking to avoid stuff ending up in the LOH in the first place.
Avoid very long Remoting calls, as Remoting makes heavy use of MemoryStreams which can fragment the LOH during the length of the call.
Watch out for string interning — for some reason these are stored as pages on the LOH and can cause serious fragmentation if your application continues to encounter new strings to intern, i.e. avoid using string.Intern unless the set of strings is known to be finite and the full set is encountered early on in the application's life. (See my earlier question.)
Use Son of Strike to see what exactly is using the LOH memory. Again see this question for details on how to do this.
Consider pooling large arrays.
Edit: the LOH threshold for double arrays appears to be 8k.

It's an old question, but I figure it doesn't hurt to update answers with changes introduced in .NET. It is now possible to defragment the Large Object Heap. Clearly the first choice should be to make sure the best design choices were made, but it is nice to have this option now.
https://msdn.microsoft.com/en-us/library/xe0c2357(v=vs.110).aspx
"Starting with the .NET Framework 4.5.1, you can compact the large object heap (LOH) by setting the GCSettings.LargeObjectHeapCompactionMode property to GCLargeObjectHeapCompactionMode.CompactOnce before calling the Collect method, as the following example illustrates."
GCSettings can be found in the System.Runtime namespace
GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect();

The first thing that comes to mind is to split the array up into smaller ones, so they don't reach the memory needed for the GC to put in it the LOH. You could spit the arrays into smaller ones of say 10,000, and build an object which would know which array to look in based on the indexer you pass.
Now I haven't seen the code, but I would also question why you need an array that large. I would potentially look at refactoring the code so all of that information doesn't need to be stored in memory at once.

You get it wrong. You do NOT need to havean array size 4000000 and you definitely do not need to call the garbace collector.
Write your own IList implementation. Like "PagedList"
Store items in arrays of 65536 elements.
Create an array of arrays to hold the pages.
This allows you to access basically all your elements with ONE redirection only. And, as the individual arrays are smaller, fragmentation is not an issue...
...if it is... then REUSE pages. Dont throw them away on dispose, put them on a static "PageList" and pull them from there first. All this can be transparently done within your class.
The really good thing is that this List is pretty dynamic in the memory usage. You may want to resize the holder array (the redirector). Even when not, it is about 512kbdata per page only.
Second level arrays have basically 64k per byte - which is 8 byte for a class (512kb per page, 256kb on 32 bit), or 64kb per struct byte.
Technically:
Turn
int[]
into
int[][]
Decide whether 32 or 64 bit is better as you want ;) Both ahve advantages and disadvantages.
Dealing with ONE large array like that is unwieldely in any langauge - if you ahve to, then... basically.... allocate at program start and never recreate. Only solution.

This is an old question, but with .NET Standard 1.1 (.NET Core, .NET Framework 4.5.1+) there is another possible solution:
Using ArrayPool<T> in the System.Buffers package, we can pool arrays to avoid this problem.

Am adding an elaboration to the answer above, in terms of how the issue can arise. Fragmentation of the LOH is not only dependent on the objects being long lived, but if you've got the situation that there are multiple threads and each of them are creating big lists going onto the LOH then you could have the situation that the first thread needs to grow its List but the next contiguous bit of memory is already taken up by a List from a second thread, hence the runtime will allocate new memory for the first threads List - leaving behind a rather big hole. This is whats happening currently on one project I've inherited and so even though the LOH is approx 4.5 MB, the runtime has got a total of 117MB free memory but the largest free memory segment is 28MB.
Another way this could happen without multiple threads, is if you've got more than one list being added to in some kind of loop and as each expands beyond the memory initially allocated to it then each leapfrogs the other as they grow beyond their allocated spaces.
A useful link is: https://www.simple-talk.com/dotnet/.net-framework/the-dangers-of-the-large-object-heap/
Still looking for a solution for this, one option may be to use some kind of pooled objects and request from the pool when doing the work. If you're dealing with large arrays then another option is to develop a custom collection e.g. a collection of collections, so that you don't have just one huge list but break it up into smaller lists each of which avoid the LOH.

Related

Big O cost of allocating a new array in .net

What is the big O time complexity of allocating an array in .Net?
I'm guessing that if the array is small enough to fit on the ephemeral segment it should be O(1), but that as n gets larger it gets more difficult to find enough memory so it may change.
Also the large object heap may be fragmented, so if n is larger enough for the array to fit on the LOH, it probably won't be O(1).

A new array will be allocated in two distinct heaps, as most are probably aware, depending on its size (and the size threshold is 85000 bytes):
Small Object Heap - allocation here happens in so-called allocation context which is pre-zeroed region of memory, located inside an ephemeral segment. Two scenarios may happen here:
there is enough space for a new array in current allocation context - in such case we can treat it as O(1) operation of just returning an address for the array (and bumping pointer for the next objects)
there is not enough space there - allocation context will be tried to be enlarged by an allocation quantum (usually around 8kB) if it is possible (like it lies at the end of the ephemeral segment). Here we hit the cost of zeroing those 8kB so it is significantly bigger. Even worse, allocation context may not be possible to enlarge - because it may lie between already allocated objects. In such case a new allocation context will be created - somewhere inside the ephemeral segment with the help of free-list, to make use of the fragmentation. In this case, the cost is even bigger - traversing free-list to find the proper place and then zeroing it. Still, the cost does not depend on the array size directly and is "constant" so we can treat it as O(1) like previously.
Large Object Heap - because allocations here are by default much less frequent, it uses "ad-hoc" allocation contexts - each time allocation happens here, GC searches for the appropriate place with the help of the free-list and zeros it. Again, both cost of free-list traversal and memory zeroing happens here, but as objects are big here, it is mainly predominated by zeroing cost. Here we can talk about O(n) cost.
In case of LOH allocation one should be aware of an additional hidden "cost" - such allocations do not happen during some parts of Background GCs (because both operate on free-list). So if it happens that you have a lot of long Background GCs, LOH allocations will be paused waiting for GCs to end. This obviously will introduce unwanted delays for your threads.

Objects in the ephemeral segment (SOH; small object heap) are allocated after the last known object on that segment. It should really be just a pointer to there.
The "empty" space in between will not be considered, since there is no empty space. Even if the object has no reference any more, it will still be there until it's garbage collected. Then, the SOH will be compacted, so again, there are no free spaces.
If the SOH is not large enough, then a different one has to be chosen or a new segment has to be created. This will take longer, but is still O(1).
The LOH is a bit more complex, since it will usually not be compacted. There are websites that give statements that the LOH has a "free list". But I'm not sure whether it's really a list style implementation. I guess that it has a better management and works like a dictionary, so it should not be worse than O(log(n)).
What needs to be done?
perhaps get new memory from the kernel. If so, the memory was already zeroed and memset() is not needed.
if that new memory is not available in RAM, swap something to disk first. This part may become really expensive but unpredictable.
If memory is already available in .NET, it might need to be initialized to zero. But the implementation of memset() is optimized (e.g. using rep stos)
Initialize the array with values from somewhere (e.g. a file). This will likely be a .NET loop and except from swapping one of the expensive parts.
Usually, I would not consider the allocation of memory something to worry about, unless you have used a profiler (like dotMemory) that told you about memory throughput issues. Trust Donald Knuth: "premature optimization is the root of all evil".

Can I "prime" the CLR GC to expect profligate memory use?

We have a server app that does a lot of memory allocations (both short lived and long lived). We are seeing an awful lot of GC2 collections shortly after startup, but these collections calm down after a period of time (even though the memory allocation pattern is constant).
These collections are hitting performance early on.
I'm guessing that this could be caused by GC budgets (for Gen2?). Is there some way I can set this budget (directly or indirectly) to make my server perform better at the beginning?
One counter-intuitive set of results I've seen: We made a big reduction to the amount of memory (and Large Object Heap) allocations, which saw performance over the long term improve, but early performance gets worse, and the "settling down" period gets longer.
The GC apparently needs a certain period of time to realise our app is a memory hog and adapt accordingly. I already know this fact, how do I convince the GC?
Edit
OS:64 bit Windows Server 2008 R2
We're using .Net 4.0 ServerGC Batch Latency. Tried 4.5 and the 3 different latency modes, and while average performance was improved slightly, worst case performance actually deteriorated
Edit2
A GC spike can double time taken (we're talking seconds) going from acceptable to unacceptable
Almost all spikes correlate with gen 2 collections
My test run causes a final 32GB heap size. The initial frothiness lasts for the 1st 1/5th of the run time, and performance after that is actually better (less frequent spikes), even though the heap is growing. The last spike near the end of the test (with largest heap size) is the same height as (i.e. as bad as) 2 of the spikes in the initial "training" period (with much smaller heaps)

Allocation of extremely large heap in .NET can be insanely fast, and number of blocking collections will not prevent it from being that fast. Problems that you observe are caused by the fact that you don't just allocate, but also have code that causes dependency reorganizations and actual garbage collection, all at the same time when allocation is going on.
There are a few techniques to consider:
try using LatencyMode (http://msdn.microsoft.com/en-us/library/system.runtime.gcsettings.latencymode(v=vs.110).aspx), set it to LowLatency while you are actively loading the data - see comments to this answer as well
use multiple threads
do not populate cross-references to newly allocated objects while actively loading; first go through active allocation phase, use only integer indexes to cross-reference items, but not managed references; then force full GC couple times to have everything in Gen2, and only then populate your advanced data structures; you may need to re-think your deserialization logic to make this happen
try forcing your biggest root collections (arrays of objects, strings) to second generation as early as possible; do this by preallocating them and forcing full GC two times, before you start populating data (loading millions of small objects); if you are using some flavor of generic Dictionary, make sure to preallocate its capacity early on, to avoid reorganizations
any big array of references is a big source of GC overhead - until both array and referenced objects are in Gen2; the bigger the array - the bigger the overhead; prefer arrays of indexes to arrays of references, especially for temporary processing needs
avoid having many utility or temporary objects deallocated or promoted while in active loading phase on any thread, carefully look through your code for string concatenation, boxing and 'foreach' iterators that can't be auto-optimized into 'for' loops
if you have an array of references and a hierarchy of function calls that have some long-running tight loops, avoid introducing local variables that cache the reference value from some position in the array; instead, cache the offset value and keep using something like "myArrayOfObjects[offset]" construct across all levels of your function calls; it helped me a lot with processing pre-populated, Gen2 large data structures, my personal theory here is that this helps GC manage temporary dependencies on your local thread's data structures, thus improving concurrency
Here are the reasons for this behavior, as far as I learned from populating up to ~100 Gb RAM during app startup, with multiple threads:
when GC moves data from one generation to another, it actually copies it and thus modifies all references; therefore, the fewer cross-references you have during active load phase - the better
GC maintains a lot of internal data structures that manage references; if you do massive modifications to references themselves - or if you have a lot of references that have to be modified during GC - it causes significant CPU and memory bandwidth overhead during both blocking and concurrent GC; sometimes I observed GC constantly consuming 30-80% of CPU without any collections going on - simply by doing some processing, which looks weird until you realize that any time you put a reference to some array or some temporary variable in a tight loop, GC has to modify and sometimes reorganize dependency tracking data structures
server GC uses thread-specific Gen0 segments and is capable of pushing entire segment to next Gen (without actually copying data - not sure about this one though), keep this in mind when designing multi-threaded data load process
ConcurrentDictionary, while being a great API, does not scale well in extreme scenarios with multiple cores, when number of objects goes above a few millions (consider using unmanaged hashtable optimized for concurrent insertion, such as one coming with Intel's TBB)
if possible or applicable, consider using native pooled allocator (Intel TBB, again)
BTW, latest update to .NET 4.5 has defragmentation support for large object heap. One more great reason to upgrade to it.
.NET 4.6 also has an API to ask for no GC whatsoever (GC.TryStartNoGCRegion), if certain conditions are met: https://msdn.microsoft.com/en-us/library/dn906202(v=vs.110).aspx
Also see a related post by Maoni Stephens: https://blogs.msdn.microsoft.com/maoni/2017/04/02/no-gcs-for-your-allocations/

.net collections memory optimization - will this method work?

Just like almost any other big .NET application, my current C# project contains many .net collections .
Sometimes I don't know, from the beginning, what the size of a Collection (List/ObservableCollection/Dictionary/etc.) is going to be.
But there are many times when I do know what it is going to be.
I often get an OutOfMemoryException and I've been told it can happen not only because process size limits, but also because of fragmentation.
So my question is this - will setting collection's size (using the capacity argument in the constructor) every time I know its expected size help me prevent at least some of the fragmentation problems ?
This quote is from the msdn :
If the size of the collection can be
estimated, specifying the initial
capacity eliminates the need to
perform a number of resizing
operations while adding elements to
the List.
But still, I don't want to start changing big parts of my code for something that might not be the real problem.
Has it ever helped any of you to solve out of memory problems ?

Specifying an initial size will rarely if ever get rid of an OutOfMemory issue - unless your collection size is millions of object in which case you should really not keep such a collection.
Resizing a collection involves defining a completely new array with a new additional size and then copying the memory. If you are already close to out of memory, yes, this can cause an out of memory since the new array cannot be allocated.
However, 99 out of 100, you have a memory leak in your app and collection resizing issues is only a symptom of it.

If you are hitting OOM, then you may be being overly aggressive with the data, but to answer the question:
Yes, this may help some - as if it has to keep growing the collections by doubling, it could end up allocating and copying twice as much memory for the underlying array (or more precicely, for the earlier smaller copies that are discarded). Most of these intermediate arrays will be collected promptly, but when they get big you are using the "large object heap", which is harder to compact.
Starting with the correct size prevents all the intermediate copies of the array.
However, it also depends what is in the array matters. Typically, for classes, there is more data in each object (plus overheads for references etc) - meaning the list is not necessarily the biggest culprit for memory use; you might be burning up most of the memory on objects.
Note that x64 will allow more overall space, but arrays are limited to 2GB - and if each reference doubles in size this halves the maximum effective length of the array.
Personally I would look at breaking the huge sets into smaller chains of lists; jagged lists, for example.

.NET has a compating garbage collector, so you probably won't run into fragmentation problems on the normal .NET heap. You can however get memory fragmentation if you're using lots of unmanaged memory (e.g. through GDI+, COM, etc.). Also, the large object heap isn't compacted, so that can get fragmented, too. IIRC an object is put into the LOH if it's bigger than 80kb. So if you have many collections that contain more than 20k objects, you might get fragmentation problems.
But instead of guessing where the problem might be, it might be better to narrow the problem down some more: When do you get the OutOfMemoryExceptions? How much memory is the application using at that time? Using a tool like WinDbg or memory profilers you should be able to find out how much of that memory is on the LOH.
That said, it's always a good idea to set the capacity of List and other data structures in advance if you know it. Otherwise, the List will double it's capacity everytime you add an item and hit the capacity limit which means lots of unnecessary allocation and copy operations.

In order to solve this, you have to understand the basics and pinpoint the problem in your code.
It is always a good idea to set the initial capacity, if you have a sensible estimate. If you only have an approximate guess, allocate more.
Fragmentation can only occur on the LOH (objects over 80 kB). To prevent it , try to allocate blocks of the same size. Paradoxically, the solution might be to sometimes allocate more memory than you actually need.

The answer is that, yes pre-defining a size on collections will increase performance and memory optimization and reduce fragmentation. See my answer here to see why - If I set the initial size of a .NET collection and then add some items OVER this initial size, how does the collection determine the next resize?
However, without analyzing a memory dump or memory profiling on the app, it's impossible to say exactly what the cause of the OOM is. Thus, impossible to conjecture if this optimization will solve the problem.

Deallocate memory from large data structures in C#

I have a fewSortedList<>andSortedDictionary<>structures in my simulation code and I add millions of items in them over time. The problem is that the garbage collector does not deallocate quickly enough memory so there is a huge hit on the application's performance. My last option was to engage theGC.Collect()method so that I can reclaim that memory back. Has anyone got a different idea? I am aware of theFlyweightpattern which is another option but I would appreciate other suggestions that would not require huge refactoring of my code.

You are fighting the "There's no free lunch" principle. You cannot assume that stuffing millions of items in a list isn't going to affect perf. Only the SortedList<> should be a problem, it is going to start allocating memory in the Large Object Heap. That allocation isn't going to be freed soon, it takes a gen #2 collection to chuck stuff out of the LOH again. This delay should not otherwise affect the perf of your program.
One thing you can do is avoiding the multiple of copies of the internal array that SortedList<> will jam into the LOH when it keeps growing. Try to guess a good value for Capacity so it pre-allocates the large array up front.
Next, use Perfmon.exe or TaskMgr.exe and looks at the page fault delta of your program. It should be quite busy while you're allocating. If you see low values (100 or less) then you might have a problem with the paging file being fragmented. A common scourge on older machines that run XP. Defragging the disk and using SysInternals' PageDefrag utility can do extraordinary wonders.

I think the SortedList uses a array as backing field, which means that large SortedList get allocated on the Large object heap. The large object heap can get defragmentated, which can cause an out of memory exception while in principle there is still enough memory available.
See this link.
This might be your problem, as intermediate calls to GC.collect prevent the LOH from getting badly defragmented in some scenarios, which explains why calling it helps you reduce the problem.
The problem can be mitigated by splitting large objects into smaller fragments.

I'd start with doing some memory profiling on your application to make sure that the items you remove from those lists (which I assume is happening from the way your post is written) are actually properly released and not hanging around places.
What sort of performance hit are we talking and on what operating system? If I recall, GC will run when it's needed, not immediately or even "soon". So task manager showing high memory allocated to your application is not necessarily a problem. What happens if you put the machine under higher load (e.g. run several copies of your application)? Does memory get reclaimed faster in that scenario or are you starting to run out of memory?
I hope answers to these questions will help point you in a right direction.

Well, if you keep all of the items in those structures, the GC will never collect the resources because they still have references to them.
If you need the items in the structures to be collected, you must remove them from the data structure.
To clear the entire data structure try using Clear() and setting the data structure reference to null. If the data is still not getting collected fast enough, call CC.Collect().

One big byte buffer or several small ones?

I'm learning C# asynchronous socket programming, and I've learned that it's a good idea to reuse byte buffers in some sort of pool, and then just check one out as needed when receiving data from a socket.
However, I have seen two different methods of doing a byte array pool: one used a simple queue system, and just added/removed them from the queue as needed. If one was requested and there were no more left in the queue, a new byte array is created.
The other method that I've seen uses one big byte array for the entire program. The idea of a queue still applies, but instead it's a queue of integers which determine the slice (offset) of the byte array to use. If one was requested and there were no more left in the queue, the array must be resized.
Which one of these is a better solution for a highly scalable server? My instinct is it would be cheaper to just use many byte arrays because I'd imagine resizing the array as needed (even if we allocate it in large chunks) would be pretty costly, especially when it gets big. Using multiple arrays seems more intuitive too - is there some advantage to using one massive array that I'm not thinking of?

You are correct in your gut feeling. Every time you need to make the array bigger, you will be recreating the array and copying the existing bytes over. Since we are talking about bytes here, the size of the array may get large very quickly. So, you will be asking for a contiguous piece of memory each time, which, depending on how your program uses memory, might or might not be viable. This will also in effect, become a virtual pool, so to speak. A pool by definition has a set of multiple items that are managed and shared by various clients.
The one array solution is also way more complex to implement. The good thing is that a one array solution allows you to give variable-sized chunks out, but this comes at the cost of essentially reimplementing malloc: dealing with fragmentation, etc, etc, which you shouldn't get into.
A multiple array solution allows you to initialize a pool with N amount of buffers and easily manage them in a straightforward fashion. Definitely the approach I'd recommend.

I wouldn't suggest the resizing option. Start simple and work your way up. A queue of byte buffers which gets a new one added to the end when it is exhausted would be a good start. You will probably have to pay attention to threading issues, so my advice would be to use somebody else's thread-safe queue implementation.
Next you can take a look at the more complex "pointers" into a big byte array chunk, except my advice would be to have a queue of 4k/16k (some power of two multiple of the page size) blocks that you index into, and when it is full you add another big chunk to the queue. Actually, I don't recommend this at all due to the complexity and the dubious gain in performance.
Start simple, work your way up. Pool of buffers, make it thread safe, see if you need anything more.

One more vote for multiple buffers, but with the addition that since you're doing things asynchronously you need to make sure your queue is threadsafe. The default Queue<T> collection is definitely not threadsafe.
SO user and MS employee JaredPar has a good threadsafe queue implementation here:
http://blogs.msdn.com/jaredpar/archive/2009/02/16/a-more-usable-thread-safe-collection.aspx

If you use the single buffer you need a strategy of how fast it should grow when needed. If you grow it by small increments you may have to do it often and copy all the data often. If you grow it by large increments (like the next size is 1,5 times the previous one) you risk to face a situation when you get "Out of memory" simply trying to grow the buffer. It's a lose-lose choice for a scalable system. This is why reusing small buffers is preferrable.

With a garbage collection heap, you should always favor small, right-sized buffers that have a short life-time. The .NET heap allocator is very fast, generation #0 collections are very cheap.
When you keep a static buffer around, you'll use up system resources for the life of the program. The worst case scenario is when it gets big enough to get moved in the Large Object Heap where it will be a permanent obstacle that cannot be moved.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.