How to prevent object's being promoted to gen2

How to prevent object's being promoted to gen2 - c#

Runtime marks gen2 memory regions (aka card tables) via write barriers to detect if a younger generation object reference is written to gen2 object.if a gen 0 or 1 collection occurs.they are checked afterwards. While marking part affects assignment performance, obviously GC spends more time to collect Gen0 or 1.
I have pooled objects which are written references of short lived objects very frequently.
What i want to achieve is have those pooled objects always stay in gen0 but i cant think of any tehnique. Also i want to discuss if it is benefital. Maybe GC team should include it as a feature?
In short, i have long-lived objects. they hold references to both persistent and volatile objects... volitile object references are written to it very frequently which make them scanned every gen0 + write barrier - card table management overhead. what do you think the best way to squeze best performance?
edit: it is about zero allocation asnctaskmethodbuilder. uploded working sample to github:
https://github.com/crowo/LightTask/tree/master/LightTask

The issue you are describing indeed exists. Card tables scanning and promotions that results from it indeed may add some overhead. I've explained this mechanism in my video here. Unfortunatelly, it is just a feature and/or caveat of most generational GCs. Moreover, it may lead to nepotism.
BUT, the main question is whether it is a real problem or you just don't feel comfortable with the knowledge that such overhead happens? In other words, you should follow "measure first" approach.
Unfortunatelly there are no super easy metrics to measure the overhead from this particular problem. But typically we are doing following things:
since .NET 5 we have "generational aware analysis" available in the runtime itself, which we can see in PerfView's Generational Aware view. It will allow you to see which gen1/gen2 objects are holding references to younger generations. This may be not super useful in your case, as it sounds you already know where it happens
record .NET trace and look for MarkWithType event. If there are many Type="3" events (they represent GC_ROOT_OLDER reason of promotion) with big amount of Promoted bytes, it may suggest indeed you may have a problem:
Event Name Time MSec Process Name Rest
MarkWithType 186.370 Process(28820) HeapNum="5" Type="3" Promoted="62,104,694"
MarkWithType 186.421 Process(28820) HeapNum="0" Type="3" Promoted="52,687,589"
MarkWithType 186.633 Process(28820) HeapNum="3" Type="3" Promoted="49,932,696"
But only if you also correlate such GCs with long pause times or % TIme in GC.
So, only if you measure it somehow is a problem, and this typically will be when having volume of dozens of GBs, try to solve it. As it is an inherent consequence of how .NET GC is built, there is no easy workaround though. Here are some typical ideas:
try to follow #david-l advice to split object structures. Maybe old objects may reference those young, temporary objects only by some index, not a reference
or just try to redesign it all as #holger suggests to avoid those references at all
or make young, temporary objects also pooled using ObjectPool so they eventually all be living in gen2
or may those temporary objects structs that are inlined in long-living pooled objects

Related

How to minimize the length of the GC collections?

I need an application that will run smoothly. I have many serial chunks of computations I need to consecutively perform in short periods of time each, so I don't mind the GC doing it's job and I even can take more frequent collections but what I need to minimize the length of each GC collection.
I would like (if possible) to have 1 milli max pause of thread activity due to the GC each time.
what is the best way to acheive this in .NET (I know that .NET it not the technology for such demands but if it will meet my demands when optimized the save of development hours and flexibility for future specs is good incentive to try it out)?

Right from the MSDN page:
https://msdn.microsoft.com/en-us/library/ms973837.aspx
The .NET garbage collector provides a high-speed allocation service
with good use of memory and no long-term fragmentation problems,
however it is possible to do things that will give you much less than
optimal performance. To get the best out of the allocator you should
consider practices such as the following:
Allocate all of the memory (or as much as possible) to be used with a given data structure at the same time. Remove temporary allocations
that can be avoided with little penalty in complexity.
Minimize the number of times object pointers get written, especially those writes made to older objects.
Reduce the density of pointers in your data structures.
Make limited use of finalizers, and then only on "leaf" objects, as much as possible. Break objects if necessary to help with this.
A regular practice of reviewing your key data structures and conducting memory usage profiles with tools like Allocation Profiler
will go a long way to keeping your memory usage effective and having
the garbage collector working its best for you.
As Ron mentioned in his comment. You have to be extra smart with .NET if you want a lot of control over the GC.

Can I "prime" the CLR GC to expect profligate memory use?

We have a server app that does a lot of memory allocations (both short lived and long lived). We are seeing an awful lot of GC2 collections shortly after startup, but these collections calm down after a period of time (even though the memory allocation pattern is constant).
These collections are hitting performance early on.
I'm guessing that this could be caused by GC budgets (for Gen2?). Is there some way I can set this budget (directly or indirectly) to make my server perform better at the beginning?
One counter-intuitive set of results I've seen: We made a big reduction to the amount of memory (and Large Object Heap) allocations, which saw performance over the long term improve, but early performance gets worse, and the "settling down" period gets longer.
The GC apparently needs a certain period of time to realise our app is a memory hog and adapt accordingly. I already know this fact, how do I convince the GC?
Edit
OS:64 bit Windows Server 2008 R2
We're using .Net 4.0 ServerGC Batch Latency. Tried 4.5 and the 3 different latency modes, and while average performance was improved slightly, worst case performance actually deteriorated
Edit2
A GC spike can double time taken (we're talking seconds) going from acceptable to unacceptable
Almost all spikes correlate with gen 2 collections
My test run causes a final 32GB heap size. The initial frothiness lasts for the 1st 1/5th of the run time, and performance after that is actually better (less frequent spikes), even though the heap is growing. The last spike near the end of the test (with largest heap size) is the same height as (i.e. as bad as) 2 of the spikes in the initial "training" period (with much smaller heaps)

Allocation of extremely large heap in .NET can be insanely fast, and number of blocking collections will not prevent it from being that fast. Problems that you observe are caused by the fact that you don't just allocate, but also have code that causes dependency reorganizations and actual garbage collection, all at the same time when allocation is going on.
There are a few techniques to consider:
try using LatencyMode (http://msdn.microsoft.com/en-us/library/system.runtime.gcsettings.latencymode(v=vs.110).aspx), set it to LowLatency while you are actively loading the data - see comments to this answer as well
use multiple threads
do not populate cross-references to newly allocated objects while actively loading; first go through active allocation phase, use only integer indexes to cross-reference items, but not managed references; then force full GC couple times to have everything in Gen2, and only then populate your advanced data structures; you may need to re-think your deserialization logic to make this happen
try forcing your biggest root collections (arrays of objects, strings) to second generation as early as possible; do this by preallocating them and forcing full GC two times, before you start populating data (loading millions of small objects); if you are using some flavor of generic Dictionary, make sure to preallocate its capacity early on, to avoid reorganizations
any big array of references is a big source of GC overhead - until both array and referenced objects are in Gen2; the bigger the array - the bigger the overhead; prefer arrays of indexes to arrays of references, especially for temporary processing needs
avoid having many utility or temporary objects deallocated or promoted while in active loading phase on any thread, carefully look through your code for string concatenation, boxing and 'foreach' iterators that can't be auto-optimized into 'for' loops
if you have an array of references and a hierarchy of function calls that have some long-running tight loops, avoid introducing local variables that cache the reference value from some position in the array; instead, cache the offset value and keep using something like "myArrayOfObjects[offset]" construct across all levels of your function calls; it helped me a lot with processing pre-populated, Gen2 large data structures, my personal theory here is that this helps GC manage temporary dependencies on your local thread's data structures, thus improving concurrency
Here are the reasons for this behavior, as far as I learned from populating up to ~100 Gb RAM during app startup, with multiple threads:
when GC moves data from one generation to another, it actually copies it and thus modifies all references; therefore, the fewer cross-references you have during active load phase - the better
GC maintains a lot of internal data structures that manage references; if you do massive modifications to references themselves - or if you have a lot of references that have to be modified during GC - it causes significant CPU and memory bandwidth overhead during both blocking and concurrent GC; sometimes I observed GC constantly consuming 30-80% of CPU without any collections going on - simply by doing some processing, which looks weird until you realize that any time you put a reference to some array or some temporary variable in a tight loop, GC has to modify and sometimes reorganize dependency tracking data structures
server GC uses thread-specific Gen0 segments and is capable of pushing entire segment to next Gen (without actually copying data - not sure about this one though), keep this in mind when designing multi-threaded data load process
ConcurrentDictionary, while being a great API, does not scale well in extreme scenarios with multiple cores, when number of objects goes above a few millions (consider using unmanaged hashtable optimized for concurrent insertion, such as one coming with Intel's TBB)
if possible or applicable, consider using native pooled allocator (Intel TBB, again)
BTW, latest update to .NET 4.5 has defragmentation support for large object heap. One more great reason to upgrade to it.
.NET 4.6 also has an API to ask for no GC whatsoever (GC.TryStartNoGCRegion), if certain conditions are met: https://msdn.microsoft.com/en-us/library/dn906202(v=vs.110).aspx
Also see a related post by Maoni Stephens: https://blogs.msdn.microsoft.com/maoni/2017/04/02/no-gcs-for-your-allocations/

Simple algorithm to determine when to free some memory .Net

Our system keeps hold of lots of large objects for performance. However, when running low on memory, we want to drop some of the objects. The objects are prioritized, so I know which ones to drop. Is there a simple way of determining when to free memory? Also, dropping 1 object may not be enough, so I guess I need a loop to drop, check, drop again if necessary, etc. But in c#, I won't necessarily see the effect immediately of dropping an object, so how do I avoid kicking too much stuff out?
I guess it's just a simple function of used vs total physical & virtual memory. But what function?
Edit: Some clarifications
"Large objects" was misleading. I meant logical "package" of objects (the objects should be small enough individually to avoid the LOB - that's the intention certainly) that together are large (~ 100MB?)
A request can come in which requires the use of one such package. If it is in memory, the response is rapid. If not, it needs to be reconstructed, which is very slow. So I want to keep stuff in memory as long as possible, but can ditch the least requested ones when necessary.
We have no sensible way to serialize these packages. We should probably do that, but it's a lot of work and there's a lot of resistance to doing so.
Our original simple approach is to periodically compare the following to a configurable threshold.
var c = new ComputerInfo();
return c.AvailablePhysicalMemory / c.TotalPhysicalMemory;

There're a lot of different topics on this questions and I think is best to clarify them before actually answering.
First of, you say your app does get a hold of a lot of "large objects". Define large object. Anything larger than about 85K goes into the LOH which only gets collected as part of a generation 2 collection (the most expensive of them all), anything smaller than that, even if you think is a "big" object, is not and it's treated as any other kind of object.
Secondly there're two problems in terms of "managing memory"
One is managing the amount of space you're using inside your virtual memory space. That is, in 32 bit systems making sure you can address all the memory you're asking for, which in Windows 32 bit uses to be around 1,5 GB.
Secondly is managing disposing of that memory when it's needed, which is a part of the garbage collector work so that it triggers when there's a shortage on memory (although that doesn't mean you can't get an OutOfMemoryException if you don't give the GC time enough to do its job).
With that said, I think you should forget about taking the place of the GC... just let it do its job and, if you're worried then find the critical paths that may fail (on memory request) and protect yourself against OutOfMemoryExceptions.
There're a lot of different patterns for handling the case you're posting and most of them really depend on your business scenario. One example is having a state machine that can actually go to an "OutOfMemory" state, in which case the system switches to freeing memory before doing anything else (that includes disposing old objects and invoking the GC to clean everything up, all while you patiently wait for it to happen).
Other techniques involve saving the data to the disk and then manually swapping in and out objects based on some algorithm when you reach certain levels. That means stopping all your threads (or some, depending on business) and moving the data back and forth.
If your large objects are all controlled in terms of location you can also declare a facade over their creation, so that the facade can check whether it needs to free objects or not based on the amount of memory (virtual memory) your process is using. BTW, use the PerformanceInfo API call as quoted in the other answer as this will include the amount of memory used by unmanaged code, which is, nonetheless, located inside the virtual memory space of your process.
Don't worry too much about "real" memory, as the operating system will make sure the most appropriate pages are located in memory.
Then there're hundreds of other optimizations that completely depend on your business scenario. For example databases "know" to bring data to memory depending on the query and predicting the data you're going to use in advance so the data is ready and they do remove objects that are not used... but that's another topic.
Edit: Based on your edits to the question.
Checking memory in the facade will not add a significant overhead in terms of performance.
If you start getting low on memory you should take a decision of how many objects / how much space are you going to free. Don't do it one at a time, take a bunch of them and free enough memory so that you don't have to collect again.
If you go with the previous approach you can service the request after you've freed enough space and continue cleaning in background.
One of the fastest ways of handling memory / disk swapping is by using memory mapped files.

Use GC.GetTotalMemory and if this exceeds your expectation then you can nullify the objects that you want to release and call GC.Collect.

Have a look at the accepted answer to this question. It uses the GetPerformanceInfo Windows API to determine memory consumption of all sorts. Task Manager is using the same information. This should help you writing a class that observes memory consumption periodically.
Once memory runs low you can fill a FIFO queue with soon-to-be deleted tasks.
The observer will delete the first object in the queue and maybe call GCCollect manually, I'm not too sure about this.
Give the collection some time before you recheck the mem consumption for your application. If there is still not enough free mem, delete the next object from the queue and so on...

C# Garbage Collector behavior

We have an application in C# that controls one of our device and reacts to signal this device gives us.
Basically, the application creates threads, treat operations (access to a database, etc) and communicates with this device.
In the life of the application, it creates objects and release them and so far, we're letting the Garbage Collector taking care of our memory. i've read that it is highly recommanded to let the GC do its stuff without interfering.
Now the problem we're facing is that the process of our application grows ad vitam eternam, growing by step. Example:
It seems to have "waves" when the application is growing and all of a sudden, the application release some memory but seems to leave memory leaks at the same time.
We're trying to investigate the application with some memory profiler but we would like to understand deeply how the garbage Collector works.
I've found an excellent article here : The Danger of Large Objects
I've also found the official documentation here : MSDN
Do you guys know another really really deep documentation of GC?
Edit :
Here is a screenshot that illustrates the behavior of the application :
You can clearly see the "wave" effect we're having on a very regular pattern.
Subsidiary question :
I've seen that my GC Collection 2 Heap is quite big and following the same pattern as the total bytes used by my application. I guess it's perfectly normal because most of our objects will survive at least 2 garbage collections (for example Singleton classes, etc)... What do you think ?

The behavior you describe is typical of problems with objects created on Large Object Heap (LOH). However, your memory consumption seems to return to some lower value later on, so check twice whether it is really a LOH issue.
You are obviously aware of that, but what is not quite obvious is that there is an exception to the size of the objects on LOH.
As described in documentation, objects above 85000 bytes in size end up on LOH. However, for some reason (an 'optimization' probably) arrays of doubles which are longer than 1000 elements also end up there:
double[999] smallArray = ...; // ends up in 'normal', Gen-0 heap
double[1001] bigArray = ...; // ends up in LOH
These arrays can result in fragmented LOH, which requires more memory, until you get an Out of memory exception.
I was bitten by this as we had an app which received some sensor readings as arrays of doubles which resulted in LOH defragmentation since every array slightly differed in length (these were readings of realtime data at various frequencies, sampled by non-realtime process). We solved the issue by implementing our own buffer pool.

I did some research on a class I was teaching a couple of years back. I don't think the references contain any information regarding the LoH but I thought it was worthwhile to share them either way (see below). Further, I suggest performing a second search for unreleased object references before blaming the garbage collector. Simply implementing a counter in the class finalizer to check that these large objects are being dropped as you believe.
A different solution to this problem, is simply to never deallocate your large objects, but instead reuse them with a pooling strategy. In my hubris I have many times before ended up blaming the GC prematurely for the memory requirements of my application growing over time, however this is more often than not a symptom of faulty implementation.
GC References:
http://blogs.msdn.com/b/clyon/archive/2007/03/12/new-in-orcas-part-3-gc-latency-modes.aspx
http://msdn.microsoft.com/en-us/library/ee851764.aspx
http://blogs.msdn.com/b/ericlippert/archive/2010/09/30/the-truth-about-value-types.aspx
http://blogs.msdn.com/b/ericlippert/archive/2009/04/27/the-stack-is-an-implementation-detail.aspx
Eric Lippert's blog is especially interesting, when it comes to understanding anything C# in detail!

Here is an update with some of my investigations :
In our application, we're using a lot of thread to make different tasks. Some of these threads have higher priority.
1) We're using a GC that is concurrent and we tried to switch it back to non-concurrent.
We've seen a dramatic improvment :
The Garbage collector is being called much often and it seems that, when called more often, it's releasing much better our memory.
I'll post a screenshot as soon as I have a good one to illustrate this.
We've found a really good article on the MSDN. We also found an interesting question on SO.
With the next Framework 4.5, 4 possibilities will be available for GC configuration.
Workstation - non-concurrent
Workstation - concurrent
Server - non-concurrent
Server - concurrent
We'll try and switch to the "server - non-concurrent" and "serveur - concurrent" to check if it's giving us better performance.
I'll keep this thread updated with our findings.

When to go for object pooling?

When to go for object pooling using C#? Any good ex...
What are the pro's and con's of maintaining a pool of frequently used objects and grab one from the pool instead of creating a new one?

There are only two types of resources I can think of that are commonly pooled: Threads and Connections (i.e. to a database).
Both of these have one overarching concern: Scarcity.
If you create too many threads, context-switching will waste away all of your CPU time.
If you create too many network connections, the overhead of maintaining those connections becomes more work than whatever the connections are supposed to do.
Also, for a database, connection count may be limited for licensing reasons.
So the main reason you'd want to create a resource pool is if you can only afford to have a limited number of them at any one time.

I would add memory fragmentation to the list. That can occur when using objects that encapsulate native resources which after they are allocated can not be moved by the Garbage Collector and could potentially fragment the heap.
One real-life example is when you create and destroy lots of sockets. The buffers they use to read/write data have to be pinned in order to be transferred to the native WinSock API which means that when garbage collection occurs, even though some of the memory is reclaimed for the sockets that where destroyed - it could leave the memory in a fragmented state since the GC can't compact the heap after collection. Thus the read/write buffers are prime candidates for pooling. Also, if you're using SocketEventArgs objects, those would also be good candidates.
Here's a good article that talks about the garbage collection process, memory compacting and why object pooling helps.

When object creation is expensive
When you potentially can experience memory pressure - way too many objects (for instance - flyweight pattern)

For games GC may introduce an unwanted delay in some situations. If this is the case, reusing objects may be a good idea. There's are some useful considerations on the topic in this thread.

There is an excellent MSDN magazine article called Rediscover the Lost Art of Memory Optimization in Your Managed Code by Erik Brown http://msdn.microsoft.com/en-us/magazine/cc163856.aspx. It includes a general purpose object pool with a test program. This object pool does support minimum and maximum sizes. I can't find any references to people using this in production. Has anyone done so? Also, having dealt with memory fragmentation in an ASP.NET application I can attest to the value in Miky Dinescu's answer. Also, elaborating slightly on Vitaly's answer, consider the case of large objects (i.e. > 85K) which are expensive to create. Large objects only participate in Gen 2 garbage collection. This means that they won't be collected as quickly as objects that fully participate in garbage collection in Gen 0 and Gen 1. This article Large Object Heap Uncovered by Maoni Stephens
at http://msdn.microsoft.com/en-us/magazine/cc534993.aspx explains the large object heap in detail.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.