Some facts:
We have developed wcf service that acts as a layer between clients and the database.
It's selfhosted and runs as a windows service.
The service keeps several caches, where the largest are about 1-2gb in memory. Total memory usage is usually about 5-8gb.
Connections are duplex and uses tcp protocol and the serialization is made with protobuf-net. Our connected client count usually range from 1000-1500.
The server is a 8-core xeon of newish model with 64gb memory and runs nothing more then the service.
The problem: After x amount of time, it has been everywhere from a day to a week the service gets extremely slow. Requests that takes 0.5 seconds can take over a minute. This behaviour goes on for 15-40 minutes or til the service is restarted.
What we have done :
We have checked the network and network connection to the server and there is no problem. CPU utilization goes up somewhat during this time from f.eks. 30% avg to 40-50% avg.
We have taken memory dumps and there are no logical locks in code that blocks the users and not much activity at all.
Our latest lead is the Garbage collector. In perfmon we can see that "% time in gc" is constantly over 90%,(90-97%) and the collection counts rises. Both GC0 and GC1. We suspect there is a blocking GC2 running also but we had to restart the service as this is in production so it didn't count up during the 5min window we ran perfmon. Memory usage was 7,6 Gb.
Note : Calls outstanding rises so the calls get there but the service does not handle them.
My questions are, Can the garbage collector get in a state where it runs and blocks constantly for over 15minutes? or are the problem probably related to some other issue?
Our service ran GC in workstation mode and latencymode : Interactive
We have now changed this to Server and SustainedLowLatency and hopes this will help somewhat. Are there anything else we can do if its the garbage collector?
Edit : The large memory usage is by design, the data in the caches is that large and there is lots of more memory available.
Excessive garbage collection is often caused by code issues. You either create too many objects in a short time, or you keep allocating memory without releasing it.
There is actually an extensive checklist available on MSDN that should help you diagnose the problem.
A very large GC2 means that the objects in there survived multiple garbage collections, which means they are kept in memory for a longer period of time. That could be the root cause of your issue. Maybe there is a caching mechanism that could use some tuning / retention policy (remove data that isn't used for a long time).
I have a similar situation. Large database data cache in a service using protobuf with WCF for client communication. The cache is not purely just for clients, the business layer uses the cache to perform operations. The memory footprint of the service can be anywhere between 2 and 10 GB. I release a segment of the cache after 8 hours of inactivity. The machine has 8 virtual cores and 32 GB of memory. I am using .Net 4.5.1.
The GC would consume 98% of the CPU for an hour as soon as I loaded the cache from the database. The interesting point here in both our cases there is no memory pressure what so ever.
I think the GC is performed regardless because something was changed where the GC tries to keep available memory for all threads. Since one thread allocated a large amount of memory when loading the cache, the GC kicked in. I had to do several things to fix it.
1) Removed Tuples from the cache. I was using them as dictionary keys and their implementation of StructuralEquality is horrible. It compares all properties as objects so there is a lot of boxing going on for properties that are values and these will have to be garbage collected at some point.
2) When replacing Tuples used as keys I could not simply replace them with structures without implementing Equals as the value comparison uses reflection and it is too expensive so I ended up creating a Generic Pair structure. I decided to use structures to remove the number of objects when they were in arrays.
3) To remove the Tuples I had to create my own Pair structure that compares the properties using default equals for property types. Eseentially the same thing that PowerCollections created.
Related
Some background:
We are running some pipelines on a buildserver and it consumes way to much memory. The pipeline does some DB imports and it builds up memory over time x times greater than the total size of an exported DB. For the import Entity Framework (core) is used (in order to be able to reuse entity definitions used in other parts of the application).
Situtation:
We are looking into where memory consumption can be reduced. Hence I was using the memory profiler.
I've noticed that sometimes the garbage collector does seem to free up memory after process X was done, and before process Y was started.
This is as expected. The 4GB memory build up is OK(ish), as long as it is released. The code that caused this consumption is running in its own Scope (speaking about dependency injection) and the DbContexts (and other things) used are registered as Scoped. Hence we have these ScopeWorkers.
await _scopeWorker.DoWork<MyProcessX>(_ => _.Import(cancellationToken));
// In some test, memory got freed up in between, but in some other test, memory never seemed to have dropped
await _scopeWorker.DoWork<MyProcessY>(_ => _.Import(cancellationToken));
But in some other test, this drop in memory was never seen.
The red arrow indicates approximately the same moment in time, after MyProcessX.Import, and a significant drop (of 4GBs) was never seen.
Of course I do not know whether the GC spread out the cleaning of this memory over a couple dozen collection moments, instead of 3, as seen in the first screenshot.
Questions
Is it possible to wait for the garbage collector to have collected basically all memory used by MyProcessX.Import, before continueing with MyProcessY.Import?
Should the garbage collector behave consistently? In other words, should I see the same memory consumption graph over time when the processes is repeated and is doing the exact same operations (so same data, as the data comes from a static source)
If the garbage collector is inconsistent in its behavior, how to make good use of the memory profiling feature in Visual Studio to spot opportunities of lowering memory?
EDIT
Yes the memory pressure on the system changes everything, as Evk pointed out. After reserving almost all physical memory on the system (31GB/32GB) and continuing the process which I was attempting to optimize memory usage I could see a definite drop in memory used. I could repeat this, as shown in the image there are actually 2 drops in memory.
Garbage collector uses the following conditions to decide whether it should start collection:
The system has low physical memory. The memory size is detected by
either the low memory notification from the operating system or low
memory as indicated by the host.
The memory that's used by allocated objects on the managed heap
surpasses an acceptable threshold. This threshold is continuously
adjusted as the process runs.
The GC.Collect method is called. In almost all cases, you don't have
to call this method because the garbage collector runs continuously.
This method is primarily used for unique situations and testing.
The first point means it depends on all processes running on current machine, not only on your process. For the same reason you don't know when GC will start, so you can't wait for that to happen.
For that same reason it cannot behave consistently in way you describe, in relation to your process. Your process may do the same thing, but OS as a whole is unlikely to ever do the same things during your process run. In one test run there were enough free memory over whole system, and in another it was not.
What you can do is force GC to run via GC.Collect (and overloads). However that's rarely a good idea.
Main thing you should ask yourself is - does high memory consumption bring any problems? Because by itself it's not a problem (assuming no memory leaks) - you have RAM to be used, not to just stay "free". If there is enough memory currently - GC might rightfully decide to not waste time on garbage collection and do that later when necessary.
I have a question regarding high memory usage of Web Role running MVC application, with Simple Injector as DI, Entity Framework 6 for DAL. Application is running on Azure Cloud Service as Web Role with 2 x Standard A2 Instances (2 Cores, 3.5 GB RAM) and is also running CachingService (Co-located Role) with 20% memory usage configured.
Problem is that when instance is started or rebooted the memory usage of w3wp.exe service is only around 500-600 MB (with all other apps memory usage is around 50%), but even if there are no requests coming in it starts and continues growing until around 1.7GB and stops (with all other apps memory usage is around 90%). But what I noticed is that memory drops sometimes randomly and of course after reboot or republishing.
After monitoring memory heaps I noticed that it is Gen2 Heap that grows and stays large and after debugging locally with ANTS Memory Profiler I saw that largest amount of Gen2 is taken by Entity Framework objects with class name "TypeUsage" and "MetadataProperty" objects ("System.Data.Entity.Core.Metadata.Edm" namespace).
Now my question are:
is this a memory leak in our code and how can I solve it if that is the case (I checked and already tried to dispose DbContext that is created every request)?
is this a memory leak in EF, if that is the case what can I do about this, maybe another DAL framework?
is this a normal behavior and I should leave it as it is?
There is a very low chance that this is a memory leak in EF and this is not OK and you shouldn't leave it like this. Your code leaks memory.
The best way to find the leak is to use a memory profiler (ANTS is a good option, I used dotMemory). The profiler will show you the leaked objects and it should also show you two other important things:
The stack trace of the location in code where the object was created
The object tree which keeps reference to your leaked object and doesn't allow it to be collected.
These should help you understand how the objects were created and why they weren't GC'ed.
You mentioned that most of the memory is in Gen2. That means that your leaked objects are referenced by something "long lived". This could be a static variable, ASP.Net Application, or something similar.
The random drop of memory may occur when IIS recycles your application. By default that happens every 29 hours, but IIS may be configured differently or may decide to recycle your application for some other purpose.
"But what I noticed is that memory drops sometimes randomly..."
Probably it's not a memory leak but an issue of the uncontrolled growth of garbage collection. I faced something similar some years ago.
The problem is that by default garbage collector lets the process memory grow until its size overs some bound of the totally available memory in OS. When your process runs in the cloud environment which is a kind of shared hosting, it's possible that it still doesn't reach the necessary memory bound from the OS point of view and so the memory is not collected, but it overs a memory limitation for a shared process.
I'd recommend you to force the garbage collector to collect the memory explicitly by using GC.Collect(0); periodically, after the certain amount of operations. May be it can solve the problem.
I had a similar problem (web app with EF and lots of TypeUsage's taking up memory in the dump) and found that setting "Enable 32-Bit Applications" on the application pool reduced the memory use considerably.
We have a server app that does a lot of memory allocations (both short lived and long lived). We are seeing an awful lot of GC2 collections shortly after startup, but these collections calm down after a period of time (even though the memory allocation pattern is constant).
These collections are hitting performance early on.
I'm guessing that this could be caused by GC budgets (for Gen2?). Is there some way I can set this budget (directly or indirectly) to make my server perform better at the beginning?
One counter-intuitive set of results I've seen: We made a big reduction to the amount of memory (and Large Object Heap) allocations, which saw performance over the long term improve, but early performance gets worse, and the "settling down" period gets longer.
The GC apparently needs a certain period of time to realise our app is a memory hog and adapt accordingly. I already know this fact, how do I convince the GC?
Edit
OS:64 bit Windows Server 2008 R2
We're using .Net 4.0 ServerGC Batch Latency. Tried 4.5 and the 3 different latency modes, and while average performance was improved slightly, worst case performance actually deteriorated
Edit2
A GC spike can double time taken (we're talking seconds) going from acceptable to unacceptable
Almost all spikes correlate with gen 2 collections
My test run causes a final 32GB heap size. The initial frothiness lasts for the 1st 1/5th of the run time, and performance after that is actually better (less frequent spikes), even though the heap is growing. The last spike near the end of the test (with largest heap size) is the same height as (i.e. as bad as) 2 of the spikes in the initial "training" period (with much smaller heaps)
Allocation of extremely large heap in .NET can be insanely fast, and number of blocking collections will not prevent it from being that fast. Problems that you observe are caused by the fact that you don't just allocate, but also have code that causes dependency reorganizations and actual garbage collection, all at the same time when allocation is going on.
There are a few techniques to consider:
try using LatencyMode (http://msdn.microsoft.com/en-us/library/system.runtime.gcsettings.latencymode(v=vs.110).aspx), set it to LowLatency while you are actively loading the data - see comments to this answer as well
use multiple threads
do not populate cross-references to newly allocated objects while actively loading; first go through active allocation phase, use only integer indexes to cross-reference items, but not managed references; then force full GC couple times to have everything in Gen2, and only then populate your advanced data structures; you may need to re-think your deserialization logic to make this happen
try forcing your biggest root collections (arrays of objects, strings) to second generation as early as possible; do this by preallocating them and forcing full GC two times, before you start populating data (loading millions of small objects); if you are using some flavor of generic Dictionary, make sure to preallocate its capacity early on, to avoid reorganizations
any big array of references is a big source of GC overhead - until both array and referenced objects are in Gen2; the bigger the array - the bigger the overhead; prefer arrays of indexes to arrays of references, especially for temporary processing needs
avoid having many utility or temporary objects deallocated or promoted while in active loading phase on any thread, carefully look through your code for string concatenation, boxing and 'foreach' iterators that can't be auto-optimized into 'for' loops
if you have an array of references and a hierarchy of function calls that have some long-running tight loops, avoid introducing local variables that cache the reference value from some position in the array; instead, cache the offset value and keep using something like "myArrayOfObjects[offset]" construct across all levels of your function calls; it helped me a lot with processing pre-populated, Gen2 large data structures, my personal theory here is that this helps GC manage temporary dependencies on your local thread's data structures, thus improving concurrency
Here are the reasons for this behavior, as far as I learned from populating up to ~100 Gb RAM during app startup, with multiple threads:
when GC moves data from one generation to another, it actually copies it and thus modifies all references; therefore, the fewer cross-references you have during active load phase - the better
GC maintains a lot of internal data structures that manage references; if you do massive modifications to references themselves - or if you have a lot of references that have to be modified during GC - it causes significant CPU and memory bandwidth overhead during both blocking and concurrent GC; sometimes I observed GC constantly consuming 30-80% of CPU without any collections going on - simply by doing some processing, which looks weird until you realize that any time you put a reference to some array or some temporary variable in a tight loop, GC has to modify and sometimes reorganize dependency tracking data structures
server GC uses thread-specific Gen0 segments and is capable of pushing entire segment to next Gen (without actually copying data - not sure about this one though), keep this in mind when designing multi-threaded data load process
ConcurrentDictionary, while being a great API, does not scale well in extreme scenarios with multiple cores, when number of objects goes above a few millions (consider using unmanaged hashtable optimized for concurrent insertion, such as one coming with Intel's TBB)
if possible or applicable, consider using native pooled allocator (Intel TBB, again)
BTW, latest update to .NET 4.5 has defragmentation support for large object heap. One more great reason to upgrade to it.
.NET 4.6 also has an API to ask for no GC whatsoever (GC.TryStartNoGCRegion), if certain conditions are met: https://msdn.microsoft.com/en-us/library/dn906202(v=vs.110).aspx
Also see a related post by Maoni Stephens: https://blogs.msdn.microsoft.com/maoni/2017/04/02/no-gcs-for-your-allocations/
I am running a large ASP.net 4.0 website. It uses a popular .Net content management system, has thousands of content items, hundreds of concurrent users - is basically a heavy website.
Over the course of 1 day the memory usage of the IIS7 worker process can rise to 8-10GB. The server has 16GB installed and is currently set to recycle the app pool once per day.
I am getting pressured to reduce memory usage. Much of the memory usage is due to caching of large strings of data - but the cache interval is only set to 5-10 minutes - so these strings should eventually expire from memory.
However after running RedGate Memory Profiler I can see what I think are memory leaks. I have filtered my Instance List results by objects that are "kept in memory exclusively by Disposed Objects" ( I read on the RedGate forum that this is how you find memory leaks ). This gave me a long list of strings that are being held in memory.
For each string I use Instance Retention Graph to see what holds it in memory. The System.string objects seem to have been cached at some point by System.Web.Caching.CacheDependency. If I follow the graph all the way up it goes through various other classes including System.Collections.Specialized.ListDictionary until it reaches System.Web.FileMonitor. This makes some sense as the strings are paths to a file (images / PDFs / etc).
It seems that the CMS is caching paths to files, but these cached objects are then "leaked". Over time this builds up and eats up RAM.
Sorry this is long winded... Is there a way for me to stop these memory leaks? Or to clear them down without resorting to recycling the app pool? Can I find what class / code is doing the caching to see if I can fix the leak?
It sounds like the very common problem of stuff being left in memory as part of session state. If that's the case your only options are 1. don't put so much stuff in each user's session, 2. Set the session lifetime to something shorter (the default is 20 minutes, I think), and 3. periodically recycle the app pool.
As part of 1. I found that there are "good ways" and "bad ways" of presenting data in a data grid control. You may want to check that you are copying only the data you need and not accidentally maintaining references to the entire datagrid.
I know there's tons of threads about this. And I read a few of them.
I'm wondering if in my case it is correct to GC.Collect();
I have a server for a MMORPG, in production it is online day and night. And the server is restarted every other day to implement changes to the production codebase. Every twenty minutes the server pauses all other threads, and serializes the current game state. This usually takes 0.5 to 4 seconds
Would it be a good idea to GC.Collect(); after serialization?
The server is, obviously, constantly creating and destroying game items.
Would I have a notorious gain in performance or memory optimization / usage?
Should I not manually collect?
I've read about how collecting can be bad if used in the wrong moments or too frequently, but I'm thinking these saves are both a good moment to collect, and not that frequent.
The server is in framework 4.0
Update in answer to a comment:
We are randomly experiencing server freezes, sometimes, unexpectedly, the server memory usage will raise increasingly until it reaches a point when the server takes way too long to handle any network operation. Thus, I'm considering a lot of different approaches to solve the issue, this is one of them.
The garbage collector knows best when to run, and you shouldn't force it.
It will not improve performance or memory optimization. CLR can tell GC to collect object which are no longer used if there is a need to do that.
Answer to an updated part:
Forcing the collection is not a good solution to the problem. You should rather have a look a bit deeper into your code to find out what is wrong. If memory usage grows unexpectedly you might have an issue with unmanaged resources which are not properly handled or even a "leaky code" within managed code.
One more thing. I would be surprise if calling GC.Collect fixed the problem.
Every twenty minutes the server pauses
all other threads, and serializes the
current game state. This usually takes
0.5 to 4 seconds
If all your threads are suspended already anyway you might as well call the garbage collection, since it should be fairly fast at this point. I suspect doing this will only mask your real problem though, not actually solve it.
We are randomly experiencing server
freezes, sometimes, unexpectedly, the
server memory usage will raise
increasingly until it reaches a point
when the server takes way too long to
handle any network operation. Thus,
I'm considering a lot of different
approaches to solve the issue, this is
one of them.
This sounds more like you actually are still referencing all these objects that use the memory - if you weren't the GC would run due to the memory pressure and try to release those objects. You might be looking at an actual bug in your production code (i.e. objects that are still subscribed to events or otherwise are being referenced when they shouldn't be) rather than something you can fix by manually taking out the garbage.
If possible in this scenario you should run a performance analysis to see where your bottlenecks are and what part of your code is causing the brunt of the memory allocations.
Could the memory increase be an "attack" by a player with a fake/modified game-client? Is a lot of memory allocated by the server when it accepts a new client connection? Does the server handle bogus incoming data well?