Large Gaps within Generation #0 heap

Large Gaps within Generation #0 heap - c#

I'm currently investigating a memory leak, which I can't really explain and now I'm searching for helpful links or ideas.
Here is a screenshot of the native memory of the application (made with the .NET Memory profiler):
The application takes around 2.2 GB (which is normal). The dump was taken when the application had around ~3.5 GB. And these Gaps in the generation #0 are what I currently can't explain. For me this seems as the garbage collector doesn't compact the gaps in generation #0.
In order to have one clear question:
How do these kind of gaps happen? For me this seems as the GC has collected dead objects but hasn't compacted the heap. I know that I can't trigger or force the GC to compact the heap.
I've searched on this site for similar questions, but most of these are about the LOH, (which seems fine in my case). The only question which has some kind of similar large gaps is this: What are GC Holes, but I can't see how 2 KB of Generation #0 pinned instances produce 1 GB holes. Another question is about the threshold to trigger the GC GC thresholds. But I can't believe that there wasn't a single compaction phase.

Holes/Gaps represent memory that is unused between two allocated instances. “Holes” appear when the heap is not fully compacted (due to pinned instances or optimizations in the garbage collector).
You can't explicitly compact the heap per se. However, the GC is sometimes able to do this performing either a full or partial collect.
So you have gaps, what does it mean? It means it makes more complicated to allocate to the heap, what can you do about it... not a lot. if its a performance issue, you could play around with the GC a bit. However if you are using lots of pinned memory there isn't much you can do. This is called fragmentation
Additional resources
Garbage Collector Basics and Performance Hints
What are GC holes?
GC behavior when pinning an object
What causes memory fragmentation in .NET
No More Memory Fragmentation on the .NET Large Object Heap

To this day I'm still not certain what exactly caused this issue but it was resolved this way. First of all I had some Windows Dumps and NMP Dumps and poked a lot in them (WinDBG was pretty helpful). There I found out, that some of the data was related to sockets.
So the only shot was to rewrite the current socket implementation from the Begin* calls to the *Async calls. (see this question)
There we followed the msdn implementation. (Create one large buffer and reuse it). Especially the comments there point in that direction:
// Allocates one large byte buffer which all I/O operations use a piece of. This gaurds
// against memory fragmentation
The gaps are not gone now, but they are much smaller and they don't grow any more...

Related

how to make sure memory for objects like LOH get released at end of request in case of Dependency Injection asp.net core [duplicate]

First off, how big is considered large? Is there anyway to determine how large an object is in heap?
.Net 4.5.1 comes with this LargeObjectHeapCompactionMode:
After the LargeObjectHeapCompactionMode property is set to
GCLargeObjectHeapCompactionMode.CompactOnce, the next full blocking
garbage collection (and compaction of the LOH) occurs at an
indeterminate future time. You can compact the LOH immediately by
using code like the following:
GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect();
From what I've heard, it's a bad thing to compact LOH! So, which one is worst? Compact LOH or having LOH fragmentation?

Allocations >= 85 KB go onto the LOH. Compacting the LOH is not bad -- it's just that LOH fragmentation isn't something the great majority of apps need to worry about, so for them it's not worth the expense of compacting.
Fragmentation occurs when you allocate several large objects and they all get taken from the same page of address space, then let some of those objects get collected. The remaining free space in that page might be unusable because it is too small, or even simply "forgotten" in the sense that the allocator won't ever reconsider using it again.
Eventually there are fewer and fewer clean pages to use, so the allocator will start to slow down as it forcibly moves objects or even start throwing OutOfMemory exceptions. Compaction moves those objects to new pages, reclaiming that free space.
Does your app have this object usage pattern? Most don't. And on 64-bit platforms, you might not even notice it as there's quite a bit more address space to fragment before it becomes a huge issue.

TPL and memory management

Using the Visual Studio Concurrency Visualizer I now see why I don't get any benefit switching to Parallel.For: only the 9% of the time the machine is busy executing the code, the rest is 71% synchronization and 17% memory management (1).
Checking all the orange stripes on the diagram below I discovered that GC is always involved (2).
After reading all these interesting topics...
Why do I have a lock here?
https://blog.marcgravell.com/2011/10/assault-by-gc.html
Prevent .NET Garbage collection for short period of time
https://devblogs.microsoft.com/premier-developer/understanding-different-gc-modes-with-concurrency-visualizer/
.. am I right assuming that all these threads need to play with a single memory management object and therefore removing the need to allocate objects on the heap my scenario will improve considerably? Like using structs instead of classes, array instead of dynamic lists, etc.?
I have a lot of work to do to bend my code in this direction. Just wanted to be sure before starting.

From your screenshot it seems like memory allocation is blocked while waiting for GC to complete. There are server and workstation GC modes, and it may be concurrent or not, but all options need to block threads at least a little while. I would check in more detail how often, and how much time you are spending in GC, and how often gen 0/1 and 2 is running.
I believe that each thread has a separate ephemeral segment it uses for allocations, so that it would not need to synchronize allocations, unless it needs a new segment, or the allocation is on the large object heap. But I'm unable to find a reference for this.
In any case, you will likely benefit from reducing the amount and size of allocations. If possible, use a object pool or memory pool to reuse memory. You might also benefit from increasing the amount of memory and checking the application for memory leaks. A general recommendation for memory is that there should be two types of allocations:
Small temporary allocations that only live for a short duration, like a temporary object that live for the duration of a method call.
Long lived allocations of any size that live for the duration of the "application".
If this pattern is followed almost all garbage should be collected in Gen 0/1, and gen 2 collections should be fairly rare.
It also depends a bit if you are allocating many small objects, or large chunks of memory. If the former you may consider using structs since these are stack allocated. If the later you also need to consider memory fragmentation, and this should also improve by using a memory pool that only allocates fixed sized chunks of memory.
Edit:
At the very simplest a object pool could be something like this:
public class ObjectPool<T>
{
private ConcurrentBag<T> pool = new ConcurrentBag<T>();
public T Get(Func<T> constructor) => pool.TryTake(out var result) ? result : constructor();
public void Return(T obj) => pool.Add(obj);
}
This assumes that the objects represent identical resources, like byte arrays of some fixed size. But there are also existing implementations:
.Net core MemoryPool
asp.Net core object pool
stack overflow question regarding object pools

Memory Management The Memory Management report shows the calls where memory management blocks occurred, along with the total blocking
times of each call stack. Use this information to identify areas that
have excessive paging or garbage collection issues.
Further more
Memory management time
These segments in the timeline are associated with blocking times that
are categorized as Memory Management. This scenario implies that a
thread is blocked by an event that is associated with a memory
management operation such as Paging. During this time, a thread has
been blocked in an API or kernel state that the Concurrency Visualizer
is counting as memory management. These include events such as paging
and memory allocation. Examine the associated call stacks and profile
reports to better understand the underlying reasons for blocks that
are categorized as Memory Management.
Yes, allocating less will likely have a large benefit on your resources and efficiency, but that is almost always the case on hot paths and thrashed applications
Heap allocations and particular Large Object Heap (LOB) allocations are costly, it also creates extra work for your The Garbage Collector and can fragment your memory causing even more inefficiency. The less you allocate, or reuse memory, or use the stack the better you are (in general).
This is also where you would learn to use a good memory profiler and get to know your garbage collector.
On saying that this would not be the only tool you would use to make your application less allocatey. A good memory profiler will go a long way, combined with learning how to read the results and affect changes based on the results.
Creating minimal allocation code is an artform, and one worth your learning
Also as #mjwills pointed out in the comments, you would run any change through your benchmark software as well, removing allocations at the cost of CPU time won't make sense. There are a lot of ways to speed up code, and low allocation is just one of a lot of approaches that may help.
Lastly, I would suggest following Marc Gravell and his blogs as a start (Mr DeAllocation), get to know your Garbage Collector and how the generations wortk, and tools like memory profilers and benchmarkers for performant silky smooth production code

C# Garbage Collector behavior

We have an application in C# that controls one of our device and reacts to signal this device gives us.
Basically, the application creates threads, treat operations (access to a database, etc) and communicates with this device.
In the life of the application, it creates objects and release them and so far, we're letting the Garbage Collector taking care of our memory. i've read that it is highly recommanded to let the GC do its stuff without interfering.
Now the problem we're facing is that the process of our application grows ad vitam eternam, growing by step. Example:
It seems to have "waves" when the application is growing and all of a sudden, the application release some memory but seems to leave memory leaks at the same time.
We're trying to investigate the application with some memory profiler but we would like to understand deeply how the garbage Collector works.
I've found an excellent article here : The Danger of Large Objects
I've also found the official documentation here : MSDN
Do you guys know another really really deep documentation of GC?
Edit :
Here is a screenshot that illustrates the behavior of the application :
You can clearly see the "wave" effect we're having on a very regular pattern.
Subsidiary question :
I've seen that my GC Collection 2 Heap is quite big and following the same pattern as the total bytes used by my application. I guess it's perfectly normal because most of our objects will survive at least 2 garbage collections (for example Singleton classes, etc)... What do you think ?

The behavior you describe is typical of problems with objects created on Large Object Heap (LOH). However, your memory consumption seems to return to some lower value later on, so check twice whether it is really a LOH issue.
You are obviously aware of that, but what is not quite obvious is that there is an exception to the size of the objects on LOH.
As described in documentation, objects above 85000 bytes in size end up on LOH. However, for some reason (an 'optimization' probably) arrays of doubles which are longer than 1000 elements also end up there:
double[999] smallArray = ...; // ends up in 'normal', Gen-0 heap
double[1001] bigArray = ...; // ends up in LOH
These arrays can result in fragmented LOH, which requires more memory, until you get an Out of memory exception.
I was bitten by this as we had an app which received some sensor readings as arrays of doubles which resulted in LOH defragmentation since every array slightly differed in length (these were readings of realtime data at various frequencies, sampled by non-realtime process). We solved the issue by implementing our own buffer pool.

I did some research on a class I was teaching a couple of years back. I don't think the references contain any information regarding the LoH but I thought it was worthwhile to share them either way (see below). Further, I suggest performing a second search for unreleased object references before blaming the garbage collector. Simply implementing a counter in the class finalizer to check that these large objects are being dropped as you believe.
A different solution to this problem, is simply to never deallocate your large objects, but instead reuse them with a pooling strategy. In my hubris I have many times before ended up blaming the GC prematurely for the memory requirements of my application growing over time, however this is more often than not a symptom of faulty implementation.
GC References:
http://blogs.msdn.com/b/clyon/archive/2007/03/12/new-in-orcas-part-3-gc-latency-modes.aspx
http://msdn.microsoft.com/en-us/library/ee851764.aspx
http://blogs.msdn.com/b/ericlippert/archive/2010/09/30/the-truth-about-value-types.aspx
http://blogs.msdn.com/b/ericlippert/archive/2009/04/27/the-stack-is-an-implementation-detail.aspx
Eric Lippert's blog is especially interesting, when it comes to understanding anything C# in detail!

Here is an update with some of my investigations :
In our application, we're using a lot of thread to make different tasks. Some of these threads have higher priority.
1) We're using a GC that is concurrent and we tried to switch it back to non-concurrent.
We've seen a dramatic improvment :
The Garbage collector is being called much often and it seems that, when called more often, it's releasing much better our memory.
I'll post a screenshot as soon as I have a good one to illustrate this.
We've found a really good article on the MSDN. We also found an interesting question on SO.
With the next Framework 4.5, 4 possibilities will be available for GC configuration.
Workstation - non-concurrent
Workstation - concurrent
Server - non-concurrent
Server - concurrent
We'll try and switch to the "server - non-concurrent" and "serveur - concurrent" to check if it's giving us better performance.
I'll keep this thread updated with our findings.

OutOfMemoryException - out of ideas

I know there is no simple answer to my question but I would appreciate ideas, guides or
some sort of things-to-look-at list
I have a net Windows service that is constantly throwing OutOfMemoryException.
The service has two builds for x86 and x64 Windows. However on x64 it consumes a lot more
memory. I have tried profiling it with various memory profilers. But I cannot get a clue what the problem is. The diagnosis - service consumes lot of VMSize and crashes app after 3 to 12 hours. The behaviuor is rather stochastic - there is no observable pattern for crash scenario.
Also I tried to look at performance counters (perfmon.exe). What I can see is that
heap size is growing and %GC time is on average 19%. Plus memory allocation is correlated with %CPU time.
My application has threads and locking objects, DB connections and WCF interface.
The general question that I am trying to solve:
Is it simply GC not been fast enough
to GC objects or some non-managed
(windows) objects are consuming
memory?
See first app in list
http://s45.radikal.ru/i109/1003/af/92a389d189e8.jpg http://s45.radikal.ru/i109/1003/af/92a389d189e8.jpg
The link to picture with performance counters view
http://s006.radikal.ru/i215/1003/0b/ddb3d6c80809.jpg

Is your issue that you don't know what is consuming a lot of memory? You can open up task manager when the process is using a lot of memory, right click your process and create a dump file which you can examine in windbg to find out exactly what's allocating memory.
Tess Ferrandez has a lot of excellent demos. She goes through the most useful stuff here...

Your problem is likely to be either a classic leak (objects that are still rooted when they shouldn't be) or Large Object Heap (LOH) fragmentation.
The best tool I have found for diagnosing this class of problem is the Son of Strike (SOS) extension to the Windows debugger. Download Microsoft's Debugging Tools for Windows to get the debuggers: CDB is the console debugger (which I prefer as it seems more responsive), WinDbg is the same thing wrapped as an MDI app. These tools are quite low-level and have a bit of learning curve but provide everything you need to know to find your problem.
In particular, run !DumpHeap -stat to see what types of objects are eating your memory. This command will also report at the bottom of the list if it notices any significant fragmentation. !EEHeap will list the heap segments — if there are a lot of LOH segments then I would suspect LOH fragmenation.
0:000> .loadby sos mscorwks
0:000> !EEHeap -gc
Number of GC Heaps: 1
generation 0 starts at 0x00f7a9b0
generation 1 starts at 0x00e79c3c
generation 2 starts at 0x00b21000
ephemeral segment allocation context: none
segment begin allocated size
00b20000 00b21000 010029bc 0x004e19bc(5118396)
Large object heap starts at 0x01b21000
segment begin allocated size
01b20000 01b21000 01b8ade0 0x00069de0(433632)
If there are many LOH segments then I would begin to suspect LOH fragmentation.
Before doing this, however, I would be interested to know:
Does the application use string.Intern()?
Does the application have transient objects that subscribe to events with long-lived objects?
(The reason I ask this is that 1. the .NET string intern tables are implemented in such a way that they can cause LOH fragmenation and 2. an event subscription provides an additional root for the subscribing object which is easy to forget.)

I have used .Net Memory Profiler it much better than clr profiler by microsoft. You have to learn about it a little bit. It can tell you which object are not disposing or have references. You can also sort object base on there type and memory. I used the trial ver which last 30 days during which i was able to solve problem in my application.

If your percentage time spent on GC is high then I would look at LOH Allocations perfmon counter. If there are frequent allocations in LOH this would cause the GC to work hard to collect, which is the reason for High percentage time spent on GC.
I did blog about the identifying high CPU in GC because of LOH where it shows how to get the exact call-stack which is allocating in LOH.
Hope this helps.

.NET: What is typical garbage collector overhead?

5% of execution time spent on GC? 10%? 25%?
Thanks.

This blog post has an interesting investigation into this area.
The posters conclusion? That the overhead was negligible for his example.
So the GC heap is so fast that in a real program, even in tight loops, you can use closures and delegates without even giving it a second’s thought (or even a few nanosecond’s thought). As always, work on a clean, safe design, then profile to find out where the overhead is.

It depends entirely on the application. The garbage collection is done as required, so the more often you allocate large amounts of memory which later becomes garbage, the more often it must run.
It could even go as low as 0% if you allocate everything up front and the never allocate any new objects.
In typical applications I would think the answer is very close to 0% of the time is spent in the garbage collector.

The overhead varies widely. It's not really practical to reduce the problem domain into "typical scenarios" because the overhead of GC (and related functions, like finalization) depend on several factors:
The GC flavor your application uses (impacts how your threads may be blocked during a GC).
Your allocation profile, including how often you allocate (GC triggers automatically when an allocation request needs more memory) and the lifetime profile of objects (gen 0 collections are fastest, gen 2 collections are slower, if you induce a lot of gen 2 collections your overhead will increase).
The lifetime profile of finalizable objects, because they must have their finalizers complete before they will be eligible for collection.
The impact of various points on each of those axes of relevancy can be analyzed (and there are probably more relevant areas I'm not recalling off the top of my head) -- so the problem is really "how can you reduce those axes of relevancy to a 'common scenario?'"
Basically, as others said, it depends. Or, "low enough that you shouldn't worry about it until it shows up on a profiler report."

In native C/C++ there is sometimes a large cost of allocating memory due to finding a block of free memory that is of the right size, there is also a none 0 cost of freeing memory due to having to linked the freed memory into the correct list of blocks, and combine small blocks into large blocks.
In .NET it is very quick to allocate a new object, but you pay the cost when the garbage collector runs. However to cost of garbage collection short lived object is as close to free as you can get.
I have always found that if the cost of garbage collection is a problem to you, then you are likely to have over bigger problems with the design of your software. Paging can be a big issue with any GC if you don’t have enough physical RAM, so you may not be able to just put all your data in RAM and depend on the OS to provide virtual memory as needed.

It really can vary. Look at this demonstration short-but-complete program that I wrote:
http://nomorehacks.wordpress.com/2008/11/27/forcing-the-garbage-collector/
that shows the effect of large gen2 garbage collections.

Yes, the Garbage Collector will spend some X% of time collecting when averaged over all applications everywhere. But that doesn't necessarily means that time is overhead. For overhead, you can really only count the time that would be left after releasing an equivalent amount of memory on an unmanaged platform.
With that in mind, the actual overhead is negative, but the Garbage collector will save time by release several chunks of memory in batches. That means fewer context switches and an overall improvement in efficiency.
Additionally, starting with .Net 4 the garbage collector does a lot of it's work on a different thread that doesn't interrupt your currently running code as much. As we work more and more with mutli-core machines where a core might even be sitting idle now and then, this is a big deal.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.