I work with a flow control system written in. NET, the system inter-acts with external systems through TCP connections and routes transactions between different endpoints.
My problem:
At startup / initialization the private working set memory level is about 25000KB. After initialization when the system is in idle state, the private working set is stepping up with about 50-100KB per second until it reaches a limit of about 57000KB.
Information:
The system is generating page faults during the incrementation.
When the limit is reached, the private working set stays very stable and oscillates up and down with a few MB when I connect +300 clients and exchange high-frequency transactions for a couple of hours, the logic for garbage collection works very well.
I have profiled this system with a tool from Redgate called "Memory Profiler" which tells me the memory stepping up after initialization is allocated by unmanaged code, unfortunately this profiler does not support insight to memory allocated by unmanaged code so I have difficulties to find out what this allocated memory contains, why it is allocated and which code that allocates the memory.
The whole codebase is developed in C#, there are no references to COM+ assemblies and there is no communication with native windows API's (during the incrementation of this memory).
My question:
I need to be pointed in the right direction to find out why the memory is continuously incrementing in small chunks to a specific level after initialization.
If a page is in not working set this does not mean the page is stored only on disk or on disk at all. Pages on Windows can go to the standby list. If they do, they leave the WS and require a soft fault to bring them back. (I never understood why this mechanism is there, but it is). A soft fault is cheap.
Using Process Explorer's system information window you can see the number of hard and soft faults per seconds. Probably also available using perfmon. I suggest you check if you have hard faults (which I believe you don't so you don't have a problem and you can close the investigation).
Also, WS has nothing to do with memory usage, but I think you already knew that.
Related
What is the reason for pinned GC handles when working with unmanaged .net components? This happens from time to time without any code changed or something else. When investigating the issue, I see a lot of pinned GC-Handles
These handles seem to stick in the memory for the entire application lifetime. In this case, the library is GdPicture (14). Is there any way to investigate why those instances are not cleaned up? I'm using Dispose()/using everywhere and can't find any GC roots in the managed code.
Thanks a lot!
EDIT
Another behaviour that is strange is, that the task manager shows that the application uses about 6GB ram, when the memory profiler shows the usage of 400MB (red line is live bytes)
What is the reason for pinned GC handles when working with unmanaged .net components?
Pinning is needed when working with unmanaged code. It prevents objects from being moved during garbage collection so that the unmanaged code can have a pointer to it. The garbage collector will update all .NET references, but it will not update unmanaged pointer values.
Is there any way to investigate why those instances are not cleaned up?
No. The reason always is: there's a bug in the code. Either your code (assume that first) or in a 3rd party library (libraries are used often, chances are that leaks in the library have been found by someone else before).
I'm using Dispose()/using everywhere
Seems like you missed one or it's not using the disposable pattern.
Another behaviour that is strange is, that the task manager shows that the application uses about 6GB ram, when the memory profiler shows the usage of 400MB (red line is live bytes)
A .NET memory profiler may only show the .NET part of memory (400 MB) and omit the rest (5600 MB).
Task manager is not interested in .NET. It cares about physical RAM mostly, which is why Task Manager is not a good analytics tool in general. You don't want to analyze physical RAM, you want to analyze virtual memory.
To look for memory leaks, use Process Explorer and show the "Private Bytes" and "Virtual size" column. Process Explorer can also show you a graph over time per process.
How to proceed?
Forget about the unmanaged leak for a moment. Use a .NET profiler that has the capability of taking memory snapshots and allows you to see each individual object inside as well as a statistics.
Try to figure out the steps that it takes to create more leaks in a consistent way. Then
Take a snapshot
Repeat the leak procedure 10 times
Take a snapshot
Repeat the leak procedure another 10 times
Take a snaphot
Compare snapshot of step 1 and 3. Check for managed types that differ in multiples of 10. Compare snapshot of step 3 and 5. Check the same type again. It must be a multiple of 10. You can't leak 7 objects when you run a method 10 times.
Do a code review on the places where the affected types are used based on internal knowledge on the leak procedure (which methods are called) and the managed type. Make sure it's disposed or released properly.
I heard many times that once C# managed program request more memory from OS, it doesn't free it back, unless system is out of memory. Eg. when object is collected, it gets deleted, and memory that was occupied by the object is free to reuse by another managed object, but memory itself is not returned to operating system (for example, mono on unix wouldn't call brk / sbrk to decrease the amount of virtual memory available to the process back to what it was before its allocation).
I don't know if this really happens or not, but I can see that my c# applications, running on linux, use small amount of memory on beginning, then when I do something memory expensive, it allocates more of it, but later on when all objects get deleted (I can verify that by putting debug message to destructors), the memory is not free'd. On other hand no more memory is allocated when I run that memory expensive operation again. The program just keep on eating the same amount of memory until it is terminated.
Maybe it is just my misunderstanding of how GC in .net works, but if it really does work like this, why is that? What is a benefit of keeping the allocated memory for later, instead of returning it back to the system? How can it even know if system need it back or not? What about other application that would crash or couldn't start because of OOM caused by this effect?
I know that people will probably answer something like "GC manages memory better than you ever could, just don't care about it" or "GC knows what it does best" or "it doesn't matter at all, it's just virtual memory" but it does matter, on my 2gb laptop I am running OOM (and kernel OOM killer gets started because of that) very often when I am running any C# applications after some time precisely because of this irresponsible memory management.
Note: I was testing this all on mono in linux because I really have hard times understanding how windows manage memory, so debugging on linux is much easier for me, also linux memory management is open source code, memory management of windows kernel / .Net is rather mystery for me
The memory manager works this way because there is no benefit of having a lot of unused system memory when you don't need it.
If the memory manager would always try to have as little memory allocated as possible, that would mean that it would do a lot of work for no reason. It would only slow the application down, and the only benefit would be more free memory that no application is using.
Whenever the system needs more memory, it will tell the running applications to return as much as possible. The same signal is also sent to an application when you minimise it.
If this doesn't work the same with Mono in Linux, then that is a problem with that specific implementation.
Generally, if an app needs memory once, it will need it again. Releasing memory back to the OS only to request it back again is overhead, and if nothing else wants the memory: why bother?. It is trying to optimize for the very likely scenario of wanting it again. Additionally, releasing it back requires entire / contiguous blocks that can be handed back, which has very specific impact on things like compaction: it isn't quite as simple as "hey, I'm not using most of this : have it back" - it needs to figure out what blocks can be released, presumably after a full collect and compact (relocate objects etc) cycle.
Have simple C# console app which imports text data into SQL.
It takes around 300K in memory and 80% in CPU. There are 2Gb RAM available at any time and yet the Page Fault shows 500K.
The app is 32 bit and OS is either W2000 or XP 32 bit and .NET 3.5
Anyone can explain what could be the problem and how can I investigate this further?
EDIT: I am now certain that the page faults are related to the disk I/O (read). I commented out SQL part and the pure disk read generates that high number alone.
EDIT2: There are 200 hard faults/sec and 4000 soft faults/sec on average.
I wonder if the same would appear on W2008
First, how do you measure the memory the app is using? If you're looking at "working set" that's only the part that resides in physical memory. You should also take a look at the "VM Size" (or "Commit Size") where the actual virtual memory your process takes up.
If Windows kernel Balance Set Manager thinks that your app is inactive, or should be left behind to give other processes more power, it can decide to reduce the working set size. If working set size is smaller than what your application actually needs to work on, you could easily see a lot of page faults because it simply becomes a race between The Balance Set Manager and the application. Usually balance set manager monitors memory usage and can also decide to increase working set size accordingly. However, this might be prevented in certain circumstances like low physical free memory, high I/O (cache stress on physical memory), low process priorty, background/foreground status of the application etc.
It can simply be the behavior of .NET garbage collector due to vast amount of small memory blocks getting allocated and disposed in a very short time, causing a stress on both memory allocation and releasing. The "VM Size" could stay around the same size but behind the scenes it could be continously allocating/freeing memory, causing continous page faults.
Also know that the DLLs the process is using are also accounted for the process statistics. Not your app but one of the COM or .NET DLL you are using might be causing this behavior as well. You can deduce actual culprit by changing your application's behavior (e.g. removing DB access code and only leave object allocation code behind) to see which component is actually causing thrashing.
EDIT: About your question on GC impact on memory thrashing: The CLR actually grows the heap dynamically and gives the memory back to the OS as needed. That does not occur synchronously. GC runs behind the scenes and frees memory in large chunks to prevent hindering application performance. Say you are allocating many small objects and freeing them almost immediately. That causes many references to stay for a moment in memory before freeing. It is easy to imagine that it becomes like a head-to-head race between the garbage collector and the memory allocating code. While GC eventually catches up, the required new memory must be satisified from a "new memory", not the old one because old one is not freed up yet. Since actual memory we are working on stays around the same, balance set manager may not think of giving our process more memory because we're on the edge, always around the same physical memory size but constantly need "newly allocated memory" not "more memory", therefore page faults.
Page faults are normal. Memory gets swapped out and when you next access it that's a page fault and the system brings it back. This is by design.
I've got an app running on my machine right now with 500 million page faults. There's nothing to worry about!
Page faults means Memory issues
Consider increasing memory, if you have excessive page faults.
Have a large working set size.
The working set is the set of memory pages currently loaded in RAM. This is measured by Process\Working Set. A high value might indicate that you have loaded a number of assemblies.
Process\Working Set has no specific threshold value to watch, although a high or fluctuating value can indicate a memory shortage. A high or fluctuating value accompanied by a high rate of page faults clearly indicates that your server does not have enough memory.
Further reading:
Check memory under System Resources in following MSDN article:
http://msdn.microsoft.com/en-us/library/ff647791.aspx#scalenetchapt15_topic9
Please provide some code to investigate.
A possible answer to this I am currently testing it on my application. Break up your working set into smaller chunks and work with the chunks.
For instance I have a large list of objects (9000-30000). If I break up that list into a chunk of 500 or so at a time it should maintain the 500 objects in memory while I work on them.
You will want to increase or decrease the size of your chunk until you can work with it fast enough that the OS will maintain it in memory. This is theory I haven't fully tested it yet. But it should work.
We have a 64bit C#/.Net3.0 application that runs on a 64bit Windows server. From time to time the app can use large amount of memory which is available. In some instances the application stops allocating additional memory and slows down significantly (500+ times slower).When I check the memory from the task manager the amount of the memory used barely changes. The application keeps on running very slowly and never gives an out of memory exception.
Any ideas? Let me know if more data is needed.
You might try enabling server mode for the Garbage Collector. By default, all .NET apps run in Workstation Mode, where the GC tries to do its sweeps while keeping the application running. If you turn on server mode, it temporarily stops the application so that it can free up memory (much) faster, and it also uses different heaps for each processor/core.
Most server apps will see a performance improvement using the GC server mode, especially if they allocate a lot of memory. The downside is that your app will basically stall when it starts to run out of memory (until the GC is finished).
* To enable this mode, insert the following into your app.config or web.config:
<configuration>
<runtime>
<gcServer enabled="true"/>
</runtime>
</configuration>
The moment you are hitting the physical memory limit, the OS will start paging (that is, write memory to disk). This will indeed cause the kind of slowdown you are seeing.
Solutions?
Add more memory - this will only help until you hit the new memory limit
Rewrite your app to use less memory
Figure out if you have a memory leak and fix it
If memory is not the issue, perhaps your application is hitting CPU very hard? Do you see the CPU hitting close to 100%? If so, check for large collections that are being iterated over and over.
As with 32-bit Windows operating systems, there is a 2GB limit on the size of an object you can create while running a 64-bit managed application on a 64-bit Windows operating system.
Investigating Memory Issues (MSDN article)
There is an awful lot of good stuff mentioned in the other answers. However, I'm going to chip in my two pence (or cents - depending on where you're from!) anyway.
Assuming that this is indeed a 64-bit process as you have stated, here's a few avenues of investigation...
Which memory usage are you checking? Mem Usage or VMem Size? VMem size is the one that actually matters, since that applies to both paged and non-paged memory. If the two numbers are far out of whack, then the memory usage is indeed the cause of the slow-down.
What's the actual memory usage across the whole server when things start to slow down? Does the slow down also apply to other apps? If so, then you may have a kernel memory issue - which can be due to huge amounts of disk accessing and low-level resource usage (for example, create 20000 mutexes, or load a few thousand bitmaps via code that uses Win32 HBitmaps). You can get some indication of this on the Task Manager (although Windows 2003's version is more informative directly on this than 2008's).
When you say that the app gets significantly slower, how do you know? Are you using vast dictionaries or lists? Could it not just be that the internal data structures are getting so big so as to complicate the work any internal algorithms are performing? When you get to huge numbers some algorithms can start to become slower by orders of magnitude.
What's the CPU load of the application when it's running at full-pelt? Is actually the same as when the slow-down occurs? If the CPU usage decreases as the memory usage goes up, then that means that whatever it's doing is taking the OS longer to fulfill, meaning that it's probably putting too much load on the OS. If there's no difference in CPU load, then my guess is it's internal data structures getting so big as to slow down your algos.
I would certainly be looking at running a Perfmon on the application - starting off with some .Net and native memory counters, Cache hits and misses, and Disk Queue length. Run it over the course of the application from startup to when it starts to run like an asthmatic tortoise, and you might just get a clue from that as well.
Having skimmed through the other answers, I'd say there's a lot of good ideas. Here's one I didn't see:
Get a memory profiler, such as SciTech's MemProfiler. It will tell you what's being allocated, by what, and it will show you the whole slice n dice.
It also has video tutorials in case you don't know how to use it. In my case, I discovered I had IDisposable instances that I wasn't Using(...)
We have a web service that uses up more and more private bytes until that application stops responding. The managed heap (mostly Gen2) will show some 200-250 MB, while private bytes shows over 1GB. What are possible causes of a memory leak outside of the managed heap?
I've already checked for the following:
Prolific dynamic assemblies (Xml serialization, regex, etc.)
Session state (turned off)
System.Policy.Evidence memory leak (SP1 installed)
Threading deadlock (no use of Join, only lock)
Use of SQLOLEDB (using SqlClient)
What other sources can I check for?
Make sure your app is complied in release mode. If you compile under debug mode, and deploy that, simply instantiating a class that has an event defined (event doesn't even need to be raised), will cause a small piece of memory to leak. Instantiating enough of these objects over a long enough period of time will cause all the memory to be used. I've seen web apps that would use up all the memory within a matter of hours, simply because a debug build was used. Compiling as a release build immediately and permanently fixed the problem.
I would recommend you view snapshots of the stack at various times, and see what's using up the memory. If your application is using Java, then jmap works extremely well - you just give it the PID of the java process.
If using something else, try Lambda Probe (http://www.lambdaprobe.org/d/index.htm). It doesn't show as much detail, but will at least show you memory use.
I had a bad memory leak in my JDBC code that ended up being traced to a change in the JDBC specification a few years ago that I missed (with respect to closing statements and such). It took a combination of Lamdba Probe and then jmap to localize the problem enough to fix it.
Cheers,
-R
Also look for:
COM Assemblies being loaded
DB Connections not being closed
Cache & State (Session, Application)
Try forcing the Garbage Collector (GC) to run (write a page that does it when it loads) or try the instrumentation, but that's a bit hit and miss in my experience. Another thing would be to keep it running and see if it runs out of memory.
What could be happening is that there is plenty of memory and Windows does not signal your app to clean up. This causes the app to look like its using more and more memory because it can, when in fact the system can reclaim the memory when it needs. SQL Server and Exchange do this a lot. The idea is why cause a unnecessary cleanup when there are plenty of resources.
Rob
Garbage collection does not run until a request for memory is denied due to lack of available memory. This can often make things look like a memory leak when one is not around.
Do you have any events and event handlers within the service? Services often have static variables, and if you are creating event handlers from the static instances, connected to a non-static instance object, the static will hold a reference to the instance forever, which will stop it from releasing.
Double check that trace is not enabled. I've seen instances of trace slowly consuming memory until the app reaches it's app pool limit.
For what it is worth, my issue was not with the service, but the with HttpClient that was calling it.
The client was not properly disposed, so it kept the connection open and the memory locked.
After disposing the client the service released the memory as expected.