When to go for object pooling using C#? Any good ex...
What are the pro's and con's of maintaining a pool of frequently used objects and grab one from the pool instead of creating a new one?
There are only two types of resources I can think of that are commonly pooled: Threads and Connections (i.e. to a database).
Both of these have one overarching concern: Scarcity.
If you create too many threads, context-switching will waste away all of your CPU time.
If you create too many network connections, the overhead of maintaining those connections becomes more work than whatever the connections are supposed to do.
Also, for a database, connection count may be limited for licensing reasons.
So the main reason you'd want to create a resource pool is if you can only afford to have a limited number of them at any one time.
I would add memory fragmentation to the list. That can occur when using objects that encapsulate native resources which after they are allocated can not be moved by the Garbage Collector and could potentially fragment the heap.
One real-life example is when you create and destroy lots of sockets. The buffers they use to read/write data have to be pinned in order to be transferred to the native WinSock API which means that when garbage collection occurs, even though some of the memory is reclaimed for the sockets that where destroyed - it could leave the memory in a fragmented state since the GC can't compact the heap after collection. Thus the read/write buffers are prime candidates for pooling. Also, if you're using SocketEventArgs objects, those would also be good candidates.
Here's a good article that talks about the garbage collection process, memory compacting and why object pooling helps.
When object creation is expensive
When you potentially can experience memory pressure - way too many objects (for instance - flyweight pattern)
For games GC may introduce an unwanted delay in some situations. If this is the case, reusing objects may be a good idea. There's are some useful considerations on the topic in this thread.
There is an excellent MSDN magazine article called Rediscover the Lost Art of Memory Optimization in Your Managed Code by Erik Brown http://msdn.microsoft.com/en-us/magazine/cc163856.aspx. It includes a general purpose object pool with a test program. This object pool does support minimum and maximum sizes. I can't find any references to people using this in production. Has anyone done so? Also, having dealt with memory fragmentation in an ASP.NET application I can attest to the value in Miky Dinescu's answer. Also, elaborating slightly on Vitaly's answer, consider the case of large objects (i.e. > 85K) which are expensive to create. Large objects only participate in Gen 2 garbage collection. This means that they won't be collected as quickly as objects that fully participate in garbage collection in Gen 0 and Gen 1. This article Large Object Heap Uncovered by Maoni Stephens
at http://msdn.microsoft.com/en-us/magazine/cc534993.aspx explains the large object heap in detail.
Related
Runtime marks gen2 memory regions (aka card tables) via write barriers to detect if a younger generation object reference is written to gen2 object.if a gen 0 or 1 collection occurs.they are checked afterwards. While marking part affects assignment performance, obviously GC spends more time to collect Gen0 or 1.
I have pooled objects which are written references of short lived objects very frequently.
What i want to achieve is have those pooled objects always stay in gen0 but i cant think of any tehnique. Also i want to discuss if it is benefital. Maybe GC team should include it as a feature?
In short, i have long-lived objects. they hold references to both persistent and volatile objects... volitile object references are written to it very frequently which make them scanned every gen0 + write barrier - card table management overhead. what do you think the best way to squeze best performance?
edit: it is about zero allocation asnctaskmethodbuilder. uploded working sample to github:
https://github.com/crowo/LightTask/tree/master/LightTask
The issue you are describing indeed exists. Card tables scanning and promotions that results from it indeed may add some overhead. I've explained this mechanism in my video here. Unfortunatelly, it is just a feature and/or caveat of most generational GCs. Moreover, it may lead to nepotism.
BUT, the main question is whether it is a real problem or you just don't feel comfortable with the knowledge that such overhead happens? In other words, you should follow "measure first" approach.
Unfortunatelly there are no super easy metrics to measure the overhead from this particular problem. But typically we are doing following things:
since .NET 5 we have "generational aware analysis" available in the runtime itself, which we can see in PerfView's Generational Aware view. It will allow you to see which gen1/gen2 objects are holding references to younger generations. This may be not super useful in your case, as it sounds you already know where it happens
record .NET trace and look for MarkWithType event. If there are many Type="3" events (they represent GC_ROOT_OLDER reason of promotion) with big amount of Promoted bytes, it may suggest indeed you may have a problem:
Event Name Time MSec Process Name Rest
MarkWithType 186.370 Process(28820) HeapNum="5" Type="3" Promoted="62,104,694"
MarkWithType 186.421 Process(28820) HeapNum="0" Type="3" Promoted="52,687,589"
MarkWithType 186.633 Process(28820) HeapNum="3" Type="3" Promoted="49,932,696"
But only if you also correlate such GCs with long pause times or % TIme in GC.
So, only if you measure it somehow is a problem, and this typically will be when having volume of dozens of GBs, try to solve it. As it is an inherent consequence of how .NET GC is built, there is no easy workaround though. Here are some typical ideas:
try to follow #david-l advice to split object structures. Maybe old objects may reference those young, temporary objects only by some index, not a reference
or just try to redesign it all as #holger suggests to avoid those references at all
or make young, temporary objects also pooled using ObjectPool so they eventually all be living in gen2
or may those temporary objects structs that are inlined in long-living pooled objects
Using the Visual Studio Concurrency Visualizer I now see why I don't get any benefit switching to Parallel.For: only the 9% of the time the machine is busy executing the code, the rest is 71% synchronization and 17% memory management (1).
Checking all the orange stripes on the diagram below I discovered that GC is always involved (2).
After reading all these interesting topics...
Why do I have a lock here?
https://blog.marcgravell.com/2011/10/assault-by-gc.html
Prevent .NET Garbage collection for short period of time
https://devblogs.microsoft.com/premier-developer/understanding-different-gc-modes-with-concurrency-visualizer/
.. am I right assuming that all these threads need to play with a single memory management object and therefore removing the need to allocate objects on the heap my scenario will improve considerably? Like using structs instead of classes, array instead of dynamic lists, etc.?
I have a lot of work to do to bend my code in this direction. Just wanted to be sure before starting.
From your screenshot it seems like memory allocation is blocked while waiting for GC to complete. There are server and workstation GC modes, and it may be concurrent or not, but all options need to block threads at least a little while. I would check in more detail how often, and how much time you are spending in GC, and how often gen 0/1 and 2 is running.
I believe that each thread has a separate ephemeral segment it uses for allocations, so that it would not need to synchronize allocations, unless it needs a new segment, or the allocation is on the large object heap. But I'm unable to find a reference for this.
In any case, you will likely benefit from reducing the amount and size of allocations. If possible, use a object pool or memory pool to reuse memory. You might also benefit from increasing the amount of memory and checking the application for memory leaks. A general recommendation for memory is that there should be two types of allocations:
Small temporary allocations that only live for a short duration, like a temporary object that live for the duration of a method call.
Long lived allocations of any size that live for the duration of the "application".
If this pattern is followed almost all garbage should be collected in Gen 0/1, and gen 2 collections should be fairly rare.
It also depends a bit if you are allocating many small objects, or large chunks of memory. If the former you may consider using structs since these are stack allocated. If the later you also need to consider memory fragmentation, and this should also improve by using a memory pool that only allocates fixed sized chunks of memory.
Edit:
At the very simplest a object pool could be something like this:
public class ObjectPool<T>
{
private ConcurrentBag<T> pool = new ConcurrentBag<T>();
public T Get(Func<T> constructor) => pool.TryTake(out var result) ? result : constructor();
public void Return(T obj) => pool.Add(obj);
}
This assumes that the objects represent identical resources, like byte arrays of some fixed size. But there are also existing implementations:
.Net core MemoryPool
asp.Net core object pool
stack overflow question regarding object pools
Memory Management The Memory Management report shows the calls where memory management blocks occurred, along with the total blocking
times of each call stack. Use this information to identify areas that
have excessive paging or garbage collection issues.
Further more
Memory management time
These segments in the timeline are associated with blocking times that
are categorized as Memory Management. This scenario implies that a
thread is blocked by an event that is associated with a memory
management operation such as Paging. During this time, a thread has
been blocked in an API or kernel state that the Concurrency Visualizer
is counting as memory management. These include events such as paging
and memory allocation. Examine the associated call stacks and profile
reports to better understand the underlying reasons for blocks that
are categorized as Memory Management.
Yes, allocating less will likely have a large benefit on your resources and efficiency, but that is almost always the case on hot paths and thrashed applications
Heap allocations and particular Large Object Heap (LOB) allocations are costly, it also creates extra work for your The Garbage Collector and can fragment your memory causing even more inefficiency. The less you allocate, or reuse memory, or use the stack the better you are (in general).
This is also where you would learn to use a good memory profiler and get to know your garbage collector.
On saying that this would not be the only tool you would use to make your application less allocatey. A good memory profiler will go a long way, combined with learning how to read the results and affect changes based on the results.
Creating minimal allocation code is an artform, and one worth your learning
Also as #mjwills pointed out in the comments, you would run any change through your benchmark software as well, removing allocations at the cost of CPU time won't make sense. There are a lot of ways to speed up code, and low allocation is just one of a lot of approaches that may help.
Lastly, I would suggest following Marc Gravell and his blogs as a start (Mr DeAllocation), get to know your Garbage Collector and how the generations wortk, and tools like memory profilers and benchmarkers for performant silky smooth production code
We have an application in C# that controls one of our device and reacts to signal this device gives us.
Basically, the application creates threads, treat operations (access to a database, etc) and communicates with this device.
In the life of the application, it creates objects and release them and so far, we're letting the Garbage Collector taking care of our memory. i've read that it is highly recommanded to let the GC do its stuff without interfering.
Now the problem we're facing is that the process of our application grows ad vitam eternam, growing by step. Example:
It seems to have "waves" when the application is growing and all of a sudden, the application release some memory but seems to leave memory leaks at the same time.
We're trying to investigate the application with some memory profiler but we would like to understand deeply how the garbage Collector works.
I've found an excellent article here : The Danger of Large Objects
I've also found the official documentation here : MSDN
Do you guys know another really really deep documentation of GC?
Edit :
Here is a screenshot that illustrates the behavior of the application :
You can clearly see the "wave" effect we're having on a very regular pattern.
Subsidiary question :
I've seen that my GC Collection 2 Heap is quite big and following the same pattern as the total bytes used by my application. I guess it's perfectly normal because most of our objects will survive at least 2 garbage collections (for example Singleton classes, etc)... What do you think ?
The behavior you describe is typical of problems with objects created on Large Object Heap (LOH). However, your memory consumption seems to return to some lower value later on, so check twice whether it is really a LOH issue.
You are obviously aware of that, but what is not quite obvious is that there is an exception to the size of the objects on LOH.
As described in documentation, objects above 85000 bytes in size end up on LOH. However, for some reason (an 'optimization' probably) arrays of doubles which are longer than 1000 elements also end up there:
double[999] smallArray = ...; // ends up in 'normal', Gen-0 heap
double[1001] bigArray = ...; // ends up in LOH
These arrays can result in fragmented LOH, which requires more memory, until you get an Out of memory exception.
I was bitten by this as we had an app which received some sensor readings as arrays of doubles which resulted in LOH defragmentation since every array slightly differed in length (these were readings of realtime data at various frequencies, sampled by non-realtime process). We solved the issue by implementing our own buffer pool.
I did some research on a class I was teaching a couple of years back. I don't think the references contain any information regarding the LoH but I thought it was worthwhile to share them either way (see below). Further, I suggest performing a second search for unreleased object references before blaming the garbage collector. Simply implementing a counter in the class finalizer to check that these large objects are being dropped as you believe.
A different solution to this problem, is simply to never deallocate your large objects, but instead reuse them with a pooling strategy. In my hubris I have many times before ended up blaming the GC prematurely for the memory requirements of my application growing over time, however this is more often than not a symptom of faulty implementation.
GC References:
http://blogs.msdn.com/b/clyon/archive/2007/03/12/new-in-orcas-part-3-gc-latency-modes.aspx
http://msdn.microsoft.com/en-us/library/ee851764.aspx
http://blogs.msdn.com/b/ericlippert/archive/2010/09/30/the-truth-about-value-types.aspx
http://blogs.msdn.com/b/ericlippert/archive/2009/04/27/the-stack-is-an-implementation-detail.aspx
Eric Lippert's blog is especially interesting, when it comes to understanding anything C# in detail!
Here is an update with some of my investigations :
In our application, we're using a lot of thread to make different tasks. Some of these threads have higher priority.
1) We're using a GC that is concurrent and we tried to switch it back to non-concurrent.
We've seen a dramatic improvment :
The Garbage collector is being called much often and it seems that, when called more often, it's releasing much better our memory.
I'll post a screenshot as soon as I have a good one to illustrate this.
We've found a really good article on the MSDN. We also found an interesting question on SO.
With the next Framework 4.5, 4 possibilities will be available for GC configuration.
Workstation - non-concurrent
Workstation - concurrent
Server - non-concurrent
Server - concurrent
We'll try and switch to the "server - non-concurrent" and "serveur - concurrent" to check if it's giving us better performance.
I'll keep this thread updated with our findings.
Continuing the discussion from Understanding VS2010 C# parallel profiling results but more to the point:
I have many threads that work in parallel (using Parallel.For/Each), which use many memory allocations for small classes.
This creates a contention on the global memory allocator thread.
Is there a way to instruct .NET to preallocate a memory pool for each thread and do all allocations from this pool?
Currently my solution is my own implementation of memory pools (globally allocated arrays of object of type T, which are recycled among the threads) which helps a lot but is not efficient because:
I can't instruct .NET to allocate from a specific memory slice.
I still need to call new many times to allocate the memory for the pools.
Thanks,
Haggai
I searched for two days trying to find an answer to the same issue you had. The answer is you need to set the garbage collection mode to Server mode. By default, garbage collection mode set to Workstation mode.
Setting garbage collection to Server mode causes the managed heap to split into separately managed sections, one-per CPU.
To do this, you need to add a config setting to your app.config file.
<runtime>
<gcServer enabled="true"/>
</runtime>
The speed difference on my 12-core Opteron 6172 was dramatic!
The garbage collector does not allocate memory.
It sounds more like you're allocating lots of small temporary objects and a few long-lived objects, and the garbage collector is spending a lot of time garbage-collecting the temporary objects so your app doesn't have to request more memory from the OS. From .NET Framework 4 Advanced Development - Garbage Collection:
As long as address space is available in the managed heap, the runtime continues to allocate space for new objects. However, memory is not infinite. Eventually the garbage collector must perform a collection in order to free some memory.
The solution: Don't allocate lots of small temporary objects. The page on Garbage Collection and Performance might also be helpful.
You could pre-allocate a bunch of objects, and keep them in groups intended for separate threads. However, it's likely that you won't get any better performance from this.
The garbage collector is specially designed to handle small short-lived objects efficiently. If you keep the objects in a pool, they are long-lived and will survive a garbage collection, which in turns means that they will be copied to the second generation heap. This copying will be more expensive than just allocating new objects.
5% of execution time spent on GC? 10%? 25%?
Thanks.
This blog post has an interesting investigation into this area.
The posters conclusion? That the overhead was negligible for his example.
So the GC heap is so fast that in a real program, even in tight loops, you can use closures and delegates without even giving it a second’s thought (or even a few nanosecond’s thought). As always, work on a clean, safe design, then profile to find out where the overhead is.
It depends entirely on the application. The garbage collection is done as required, so the more often you allocate large amounts of memory which later becomes garbage, the more often it must run.
It could even go as low as 0% if you allocate everything up front and the never allocate any new objects.
In typical applications I would think the answer is very close to 0% of the time is spent in the garbage collector.
The overhead varies widely. It's not really practical to reduce the problem domain into "typical scenarios" because the overhead of GC (and related functions, like finalization) depend on several factors:
The GC flavor your application uses (impacts how your threads may be blocked during a GC).
Your allocation profile, including how often you allocate (GC triggers automatically when an allocation request needs more memory) and the lifetime profile of objects (gen 0 collections are fastest, gen 2 collections are slower, if you induce a lot of gen 2 collections your overhead will increase).
The lifetime profile of finalizable objects, because they must have their finalizers complete before they will be eligible for collection.
The impact of various points on each of those axes of relevancy can be analyzed (and there are probably more relevant areas I'm not recalling off the top of my head) -- so the problem is really "how can you reduce those axes of relevancy to a 'common scenario?'"
Basically, as others said, it depends. Or, "low enough that you shouldn't worry about it until it shows up on a profiler report."
In native C/C++ there is sometimes a large cost of allocating memory due to finding a block of free memory that is of the right size, there is also a none 0 cost of freeing memory due to having to linked the freed memory into the correct list of blocks, and combine small blocks into large blocks.
In .NET it is very quick to allocate a new object, but you pay the cost when the garbage collector runs. However to cost of garbage collection short lived object is as close to free as you can get.
I have always found that if the cost of garbage collection is a problem to you, then you are likely to have over bigger problems with the design of your software. Paging can be a big issue with any GC if you don’t have enough physical RAM, so you may not be able to just put all your data in RAM and depend on the OS to provide virtual memory as needed.
It really can vary. Look at this demonstration short-but-complete program that I wrote:
http://nomorehacks.wordpress.com/2008/11/27/forcing-the-garbage-collector/
that shows the effect of large gen2 garbage collections.
Yes, the Garbage Collector will spend some X% of time collecting when averaged over all applications everywhere. But that doesn't necessarily means that time is overhead. For overhead, you can really only count the time that would be left after releasing an equivalent amount of memory on an unmanaged platform.
With that in mind, the actual overhead is negative, but the Garbage collector will save time by release several chunks of memory in batches. That means fewer context switches and an overall improvement in efficiency.
Additionally, starting with .Net 4 the garbage collector does a lot of it's work on a different thread that doesn't interrupt your currently running code as much. As we work more and more with mutli-core machines where a core might even be sitting idle now and then, this is a big deal.