TPL and memory management - c#

Using the Visual Studio Concurrency Visualizer I now see why I don't get any benefit switching to Parallel.For: only the 9% of the time the machine is busy executing the code, the rest is 71% synchronization and 17% memory management (1).
Checking all the orange stripes on the diagram below I discovered that GC is always involved (2).
After reading all these interesting topics...
Why do I have a lock here?
https://blog.marcgravell.com/2011/10/assault-by-gc.html
Prevent .NET Garbage collection for short period of time
https://devblogs.microsoft.com/premier-developer/understanding-different-gc-modes-with-concurrency-visualizer/
.. am I right assuming that all these threads need to play with a single memory management object and therefore removing the need to allocate objects on the heap my scenario will improve considerably? Like using structs instead of classes, array instead of dynamic lists, etc.?
I have a lot of work to do to bend my code in this direction. Just wanted to be sure before starting.

From your screenshot it seems like memory allocation is blocked while waiting for GC to complete. There are server and workstation GC modes, and it may be concurrent or not, but all options need to block threads at least a little while. I would check in more detail how often, and how much time you are spending in GC, and how often gen 0/1 and 2 is running.
I believe that each thread has a separate ephemeral segment it uses for allocations, so that it would not need to synchronize allocations, unless it needs a new segment, or the allocation is on the large object heap. But I'm unable to find a reference for this.
In any case, you will likely benefit from reducing the amount and size of allocations. If possible, use a object pool or memory pool to reuse memory. You might also benefit from increasing the amount of memory and checking the application for memory leaks. A general recommendation for memory is that there should be two types of allocations:
Small temporary allocations that only live for a short duration, like a temporary object that live for the duration of a method call.
Long lived allocations of any size that live for the duration of the "application".
If this pattern is followed almost all garbage should be collected in Gen 0/1, and gen 2 collections should be fairly rare.
It also depends a bit if you are allocating many small objects, or large chunks of memory. If the former you may consider using structs since these are stack allocated. If the later you also need to consider memory fragmentation, and this should also improve by using a memory pool that only allocates fixed sized chunks of memory.
Edit:
At the very simplest a object pool could be something like this:
public class ObjectPool<T>
{
private ConcurrentBag<T> pool = new ConcurrentBag<T>();
public T Get(Func<T> constructor) => pool.TryTake(out var result) ? result : constructor();
public void Return(T obj) => pool.Add(obj);
}
This assumes that the objects represent identical resources, like byte arrays of some fixed size. But there are also existing implementations:
.Net core MemoryPool
asp.Net core object pool
stack overflow question regarding object pools

Memory Management The Memory Management report shows the calls where memory management blocks occurred, along with the total blocking
times of each call stack. Use this information to identify areas that
have excessive paging or garbage collection issues.
Further more
Memory management time
These segments in the timeline are associated with blocking times that
are categorized as Memory Management. This scenario implies that a
thread is blocked by an event that is associated with a memory
management operation such as Paging. During this time, a thread has
been blocked in an API or kernel state that the Concurrency Visualizer
is counting as memory management. These include events such as paging
and memory allocation. Examine the associated call stacks and profile
reports to better understand the underlying reasons for blocks that
are categorized as Memory Management.
Yes, allocating less will likely have a large benefit on your resources and efficiency, but that is almost always the case on hot paths and thrashed applications
Heap allocations and particular Large Object Heap (LOB) allocations are costly, it also creates extra work for your The Garbage Collector and can fragment your memory causing even more inefficiency. The less you allocate, or reuse memory, or use the stack the better you are (in general).
This is also where you would learn to use a good memory profiler and get to know your garbage collector.
On saying that this would not be the only tool you would use to make your application less allocatey. A good memory profiler will go a long way, combined with learning how to read the results and affect changes based on the results.
Creating minimal allocation code is an artform, and one worth your learning
Also as #mjwills pointed out in the comments, you would run any change through your benchmark software as well, removing allocations at the cost of CPU time won't make sense. There are a lot of ways to speed up code, and low allocation is just one of a lot of approaches that may help.
Lastly, I would suggest following Marc Gravell and his blogs as a start (Mr DeAllocation), get to know your Garbage Collector and how the generations wortk, and tools like memory profilers and benchmarkers for performant silky smooth production code

Related

Inconsistent(?) behavior of garbage collector and (almost) out of memory issues

Some background:
We are running some pipelines on a buildserver and it consumes way to much memory. The pipeline does some DB imports and it builds up memory over time x times greater than the total size of an exported DB. For the import Entity Framework (core) is used (in order to be able to reuse entity definitions used in other parts of the application).
Situtation:
We are looking into where memory consumption can be reduced. Hence I was using the memory profiler.
I've noticed that sometimes the garbage collector does seem to free up memory after process X was done, and before process Y was started.
This is as expected. The 4GB memory build up is OK(ish), as long as it is released. The code that caused this consumption is running in its own Scope (speaking about dependency injection) and the DbContexts (and other things) used are registered as Scoped. Hence we have these ScopeWorkers.
await _scopeWorker.DoWork<MyProcessX>(_ => _.Import(cancellationToken));
// In some test, memory got freed up in between, but in some other test, memory never seemed to have dropped
await _scopeWorker.DoWork<MyProcessY>(_ => _.Import(cancellationToken));
But in some other test, this drop in memory was never seen.
The red arrow indicates approximately the same moment in time, after MyProcessX.Import, and a significant drop (of 4GBs) was never seen.
Of course I do not know whether the GC spread out the cleaning of this memory over a couple dozen collection moments, instead of 3, as seen in the first screenshot.
Questions
Is it possible to wait for the garbage collector to have collected basically all memory used by MyProcessX.Import, before continueing with MyProcessY.Import?
Should the garbage collector behave consistently? In other words, should I see the same memory consumption graph over time when the processes is repeated and is doing the exact same operations (so same data, as the data comes from a static source)
If the garbage collector is inconsistent in its behavior, how to make good use of the memory profiling feature in Visual Studio to spot opportunities of lowering memory?
EDIT
Yes the memory pressure on the system changes everything, as Evk pointed out. After reserving almost all physical memory on the system (31GB/32GB) and continuing the process which I was attempting to optimize memory usage I could see a definite drop in memory used. I could repeat this, as shown in the image there are actually 2 drops in memory.
Garbage collector uses the following conditions to decide whether it should start collection:
The system has low physical memory. The memory size is detected by
either the low memory notification from the operating system or low
memory as indicated by the host.
The memory that's used by allocated objects on the managed heap
surpasses an acceptable threshold. This threshold is continuously
adjusted as the process runs.
The GC.Collect method is called. In almost all cases, you don't have
to call this method because the garbage collector runs continuously.
This method is primarily used for unique situations and testing.
The first point means it depends on all processes running on current machine, not only on your process. For the same reason you don't know when GC will start, so you can't wait for that to happen.
For that same reason it cannot behave consistently in way you describe, in relation to your process. Your process may do the same thing, but OS as a whole is unlikely to ever do the same things during your process run. In one test run there were enough free memory over whole system, and in another it was not.
What you can do is force GC to run via GC.Collect (and overloads). However that's rarely a good idea.
Main thing you should ask yourself is - does high memory consumption bring any problems? Because by itself it's not a problem (assuming no memory leaks) - you have RAM to be used, not to just stay "free". If there is enough memory currently - GC might rightfully decide to not waste time on garbage collection and do that later when necessary.

Can I "prime" the CLR GC to expect profligate memory use?

We have a server app that does a lot of memory allocations (both short lived and long lived). We are seeing an awful lot of GC2 collections shortly after startup, but these collections calm down after a period of time (even though the memory allocation pattern is constant).
These collections are hitting performance early on.
I'm guessing that this could be caused by GC budgets (for Gen2?). Is there some way I can set this budget (directly or indirectly) to make my server perform better at the beginning?
One counter-intuitive set of results I've seen: We made a big reduction to the amount of memory (and Large Object Heap) allocations, which saw performance over the long term improve, but early performance gets worse, and the "settling down" period gets longer.
The GC apparently needs a certain period of time to realise our app is a memory hog and adapt accordingly. I already know this fact, how do I convince the GC?
Edit
OS:64 bit Windows Server 2008 R2
We're using .Net 4.0 ServerGC Batch Latency. Tried 4.5 and the 3 different latency modes, and while average performance was improved slightly, worst case performance actually deteriorated
Edit2
A GC spike can double time taken (we're talking seconds) going from acceptable to unacceptable
Almost all spikes correlate with gen 2 collections
My test run causes a final 32GB heap size. The initial frothiness lasts for the 1st 1/5th of the run time, and performance after that is actually better (less frequent spikes), even though the heap is growing. The last spike near the end of the test (with largest heap size) is the same height as (i.e. as bad as) 2 of the spikes in the initial "training" period (with much smaller heaps)
Allocation of extremely large heap in .NET can be insanely fast, and number of blocking collections will not prevent it from being that fast. Problems that you observe are caused by the fact that you don't just allocate, but also have code that causes dependency reorganizations and actual garbage collection, all at the same time when allocation is going on.
There are a few techniques to consider:
try using LatencyMode (http://msdn.microsoft.com/en-us/library/system.runtime.gcsettings.latencymode(v=vs.110).aspx), set it to LowLatency while you are actively loading the data - see comments to this answer as well
use multiple threads
do not populate cross-references to newly allocated objects while actively loading; first go through active allocation phase, use only integer indexes to cross-reference items, but not managed references; then force full GC couple times to have everything in Gen2, and only then populate your advanced data structures; you may need to re-think your deserialization logic to make this happen
try forcing your biggest root collections (arrays of objects, strings) to second generation as early as possible; do this by preallocating them and forcing full GC two times, before you start populating data (loading millions of small objects); if you are using some flavor of generic Dictionary, make sure to preallocate its capacity early on, to avoid reorganizations
any big array of references is a big source of GC overhead - until both array and referenced objects are in Gen2; the bigger the array - the bigger the overhead; prefer arrays of indexes to arrays of references, especially for temporary processing needs
avoid having many utility or temporary objects deallocated or promoted while in active loading phase on any thread, carefully look through your code for string concatenation, boxing and 'foreach' iterators that can't be auto-optimized into 'for' loops
if you have an array of references and a hierarchy of function calls that have some long-running tight loops, avoid introducing local variables that cache the reference value from some position in the array; instead, cache the offset value and keep using something like "myArrayOfObjects[offset]" construct across all levels of your function calls; it helped me a lot with processing pre-populated, Gen2 large data structures, my personal theory here is that this helps GC manage temporary dependencies on your local thread's data structures, thus improving concurrency
Here are the reasons for this behavior, as far as I learned from populating up to ~100 Gb RAM during app startup, with multiple threads:
when GC moves data from one generation to another, it actually copies it and thus modifies all references; therefore, the fewer cross-references you have during active load phase - the better
GC maintains a lot of internal data structures that manage references; if you do massive modifications to references themselves - or if you have a lot of references that have to be modified during GC - it causes significant CPU and memory bandwidth overhead during both blocking and concurrent GC; sometimes I observed GC constantly consuming 30-80% of CPU without any collections going on - simply by doing some processing, which looks weird until you realize that any time you put a reference to some array or some temporary variable in a tight loop, GC has to modify and sometimes reorganize dependency tracking data structures
server GC uses thread-specific Gen0 segments and is capable of pushing entire segment to next Gen (without actually copying data - not sure about this one though), keep this in mind when designing multi-threaded data load process
ConcurrentDictionary, while being a great API, does not scale well in extreme scenarios with multiple cores, when number of objects goes above a few millions (consider using unmanaged hashtable optimized for concurrent insertion, such as one coming with Intel's TBB)
if possible or applicable, consider using native pooled allocator (Intel TBB, again)
BTW, latest update to .NET 4.5 has defragmentation support for large object heap. One more great reason to upgrade to it.
.NET 4.6 also has an API to ask for no GC whatsoever (GC.TryStartNoGCRegion), if certain conditions are met: https://msdn.microsoft.com/en-us/library/dn906202(v=vs.110).aspx
Also see a related post by Maoni Stephens: https://blogs.msdn.microsoft.com/maoni/2017/04/02/no-gcs-for-your-allocations/

When to go for object pooling?

When to go for object pooling using C#? Any good ex...
What are the pro's and con's of maintaining a pool of frequently used objects and grab one from the pool instead of creating a new one?
There are only two types of resources I can think of that are commonly pooled: Threads and Connections (i.e. to a database).
Both of these have one overarching concern: Scarcity.
If you create too many threads, context-switching will waste away all of your CPU time.
If you create too many network connections, the overhead of maintaining those connections becomes more work than whatever the connections are supposed to do.
Also, for a database, connection count may be limited for licensing reasons.
So the main reason you'd want to create a resource pool is if you can only afford to have a limited number of them at any one time.
I would add memory fragmentation to the list. That can occur when using objects that encapsulate native resources which after they are allocated can not be moved by the Garbage Collector and could potentially fragment the heap.
One real-life example is when you create and destroy lots of sockets. The buffers they use to read/write data have to be pinned in order to be transferred to the native WinSock API which means that when garbage collection occurs, even though some of the memory is reclaimed for the sockets that where destroyed - it could leave the memory in a fragmented state since the GC can't compact the heap after collection. Thus the read/write buffers are prime candidates for pooling. Also, if you're using SocketEventArgs objects, those would also be good candidates.
Here's a good article that talks about the garbage collection process, memory compacting and why object pooling helps.
When object creation is expensive
When you potentially can experience memory pressure - way too many objects (for instance - flyweight pattern)
For games GC may introduce an unwanted delay in some situations. If this is the case, reusing objects may be a good idea. There's are some useful considerations on the topic in this thread.
There is an excellent MSDN magazine article called Rediscover the Lost Art of Memory Optimization in Your Managed Code by Erik Brown http://msdn.microsoft.com/en-us/magazine/cc163856.aspx. It includes a general purpose object pool with a test program. This object pool does support minimum and maximum sizes. I can't find any references to people using this in production. Has anyone done so? Also, having dealt with memory fragmentation in an ASP.NET application I can attest to the value in Miky Dinescu's answer. Also, elaborating slightly on Vitaly's answer, consider the case of large objects (i.e. > 85K) which are expensive to create. Large objects only participate in Gen 2 garbage collection. This means that they won't be collected as quickly as objects that fully participate in garbage collection in Gen 0 and Gen 1. This article Large Object Heap Uncovered by Maoni Stephens
at http://msdn.microsoft.com/en-us/magazine/cc534993.aspx explains the large object heap in detail.

.NET: What is typical garbage collector overhead?

5% of execution time spent on GC? 10%? 25%?
Thanks.
This blog post has an interesting investigation into this area.
The posters conclusion? That the overhead was negligible for his example.
So the GC heap is so fast that in a real program, even in tight loops, you can use closures and delegates without even giving it a second’s thought (or even a few nanosecond’s thought). As always, work on a clean, safe design, then profile to find out where the overhead is.
It depends entirely on the application. The garbage collection is done as required, so the more often you allocate large amounts of memory which later becomes garbage, the more often it must run.
It could even go as low as 0% if you allocate everything up front and the never allocate any new objects.
In typical applications I would think the answer is very close to 0% of the time is spent in the garbage collector.
The overhead varies widely. It's not really practical to reduce the problem domain into "typical scenarios" because the overhead of GC (and related functions, like finalization) depend on several factors:
The GC flavor your application uses (impacts how your threads may be blocked during a GC).
Your allocation profile, including how often you allocate (GC triggers automatically when an allocation request needs more memory) and the lifetime profile of objects (gen 0 collections are fastest, gen 2 collections are slower, if you induce a lot of gen 2 collections your overhead will increase).
The lifetime profile of finalizable objects, because they must have their finalizers complete before they will be eligible for collection.
The impact of various points on each of those axes of relevancy can be analyzed (and there are probably more relevant areas I'm not recalling off the top of my head) -- so the problem is really "how can you reduce those axes of relevancy to a 'common scenario?'"
Basically, as others said, it depends. Or, "low enough that you shouldn't worry about it until it shows up on a profiler report."
In native C/C++ there is sometimes a large cost of allocating memory due to finding a block of free memory that is of the right size, there is also a none 0 cost of freeing memory due to having to linked the freed memory into the correct list of blocks, and combine small blocks into large blocks.
In .NET it is very quick to allocate a new object, but you pay the cost when the garbage collector runs. However to cost of garbage collection short lived object is as close to free as you can get.
I have always found that if the cost of garbage collection is a problem to you, then you are likely to have over bigger problems with the design of your software. Paging can be a big issue with any GC if you don’t have enough physical RAM, so you may not be able to just put all your data in RAM and depend on the OS to provide virtual memory as needed.
It really can vary. Look at this demonstration short-but-complete program that I wrote:
http://nomorehacks.wordpress.com/2008/11/27/forcing-the-garbage-collector/
that shows the effect of large gen2 garbage collections.
Yes, the Garbage Collector will spend some X% of time collecting when averaged over all applications everywhere. But that doesn't necessarily means that time is overhead. For overhead, you can really only count the time that would be left after releasing an equivalent amount of memory on an unmanaged platform.
With that in mind, the actual overhead is negative, but the Garbage collector will save time by release several chunks of memory in batches. That means fewer context switches and an overall improvement in efficiency.
Additionally, starting with .Net 4 the garbage collector does a lot of it's work on a different thread that doesn't interrupt your currently running code as much. As we work more and more with mutli-core machines where a core might even be sitting idle now and then, this is a big deal.

C# .NET Linq Memory Cleanup or Leak?

I have a large 2GB file with 1.5 million listings to process. I am running a console app that performs some string manipulation then uploads each listing to the database.
I created a LINQ object and clear the object by assigning it to a new LinqObject() for each listing (loop).
When the object is complete, I add it to a list.
When the list reaches 100 objects, I submitAll on the entire list, clear the list, then repeat.
My memory usage continues to grow as the program runs. Is there anything I should be doing to keep memory usage down? I tried GC.collect. I think I want to use dispose..
Thanks in advance for looking.
It's normal for the memory usage of a program to increase when it's working. You should not try to force the garbage collector to reduce the memory usage to try to save resources, this will most likely waste resources instead.
Contrary to one's first reaction, high memory usage is not a performance problem as long as there are any free memory left at all. Having a lot of unused memory doesn't increase the performance a bit. If you try to reduce the memory usage only to keep it down, you are just wasting CPU time doing cleanup that is not needed.
If you are running out of free memory or if some other application needs it, the garbage collector will do the appropriate cleanup. In almost every situation the garbage collector will know much more about the current memory situatiuon than you can possibly anticipate when writing the code.
If you are using objects that implement the IDisposable interface, you should call the Dispose method to free unmanaged resources, but all other objects are handled by the garbage collector. Managed objects normally don't leak memory at all.
Do you need your memory usage to stay low? Absent an actual functional problem, high memory usage in and of itself is not an issue.
How large is the memory usage growing? It may be that .NET is just "settling" effectively.
It's not really clear exactly how you're doing this, but the general principle sounds okay. I suggest you take the database work out of the equation - just comment out whichever line would actually submit to the database. See how much memory that uses. Other than the StreamReader (or whatever) you shouldn't have anything else that needs disposing if you're not touching the database - just building batches of transformed objects and throwing them away.

Categories

Resources