Chronic inappropriate System.OutOfMemoryException occurrences - c#

Here's a problem with what should be a continuously-running, unattended console app: I'm seeing too-frequent app exits from System.OutOfMemoryException being thrown from a wide variety of methods deep in the call stack -- often System.String.ToCharArray(), or System.String.CtorCharArrayStartLength(), or System.Xml.XmlTextReaderImpl.InitTextReaderInput(), but sometimes down in a System.String.Concat() call in a MongoCollection.Save() call's stack, and other unlikely places.
For what it's worth, we're using parallel tasks, but this is essentially the only app running on the server, and the app's total thread count never gets over 60. In some of these cases I know of a reason for some other exception to be thrown, but OutOfMemoryException makes no sense in these contexts, and it creates problems:
According to TaskManager and Perfmon logs, the system has had a minimum of 65% out of 8GB free memory when this has happened, and
While exception handlers sometimes fire & log the exception, they do not prevent an app crash, and
There's no continuing from this exception without user interaction (unless you suppress windows error reporting, which isn't what we want system-wide, or run the app as a service, which is possible but sub-optimal for our use-case)
So I'm aware of the workarounds mentioned above, but what I'd really like is some explanation -- and ideally a code-based handler -- for the unexpected OOM exceptions, so that we can engage appropriate continuation logic. Any ideas?

Getting that exception when using under 3GB of memory suggests that you are running a 32-bit app. Build it as a 64-bit app and it will be able to use as much memory as is available (close to 8GB).
As to why it's failing in the first place...how large is the data are you working with? If it's not very large, have you looked for references to data being kept around much longer than they are necessary (i.e. a memory leak), thus preventing proper GC?

You need to profile the application, but the most common reason for these exceptions is excessive string creation. Also, excessive serialization can cause this and excessive Xslt transformations.

Do you have a lot of objects larger or equal to 85000 bytes? Every such object will go to the Large Object Heap, which is not compacted. I.e. unlike Small Object Heap, GC will not move objects around to fill the memory holes, which can lead to fragmentation, which is a potential problem for long-lived applications.
As of .NET 4, this is still the case, but it seems they made some improvements in .NET 4.5.
A quick-and-dirty workaround is to make sure the application can use all the available memory by building it as "x64" or "Any CPU", but the real solution would be to minimize repeated allocation/deallocation cycles of large objects (i.e. use object pooling or avoid large objects altogether, if possible).
You might also want to look at this.

Related

How to predict Out of memory?

In my C# program I do various memory-consuming operations. Depending on the amount of memory currently available and various non-constant circumstances, the program fails with OutOfMemoryException at different stages.
I would like to stop processing at some point when it is more or less obvious that the program will fail with an OOM in the near future.
However, there is no fixed threshold for this; Other users may have more (or less) memory, another OS with its memory specifics and so on.
I do not want to just check that the software consumes more than eg 500MB as this may be too high or too low a restriction.
Is there any reliable way to predict an upcoming OOM in .NET?
MemoryFailPoint should do what you want.
http://msdn.microsoft.com/en-us/library/system.runtime.memoryfailpoint.aspx
The sort answer is that its probably not the best solution - instead of anticipating when you are going to run out of memory and trying to do something about it, you should probably either try and reduce the memory consumption of your application or just wait until the OutOfMemoryException exception is thrown and then clean up.
An out of memory exception is thrown whenever the .Net runtime fails to allocate the memory that was just requested - this could be because the machine has run out of physical memory and the swap file is disabled (or the machine has run out of disk space), or it could be because the virtual memory space of the process is too fragmented to allocate the desired block of memory (which can happen when working with large objects). Predicting when this is going to be is fairly difficult.
You can use the MemoryFailPoint class to check to see if a certain amount of memory is going to be available before starting an operation that uses up a large amount of memory, however this class does not guarantee that the memory will remain available for the duration of the operation and so your application could still fail with an OOM exception anyway. Although this class can be useful in some scenarios to attempt to reduce OOM exceptions (and in turn to avoid corrupting application state), its probably not going to be a "magic bullet" solution to your problem.
Many of the OOM exceptions I observe so far were caused by improper written application with memory leak or third party components. Consider using a data structure optimized for your operation.
Detecting OOM is quite difficult because there are randomly thrown. Maybe you can use MemoryFailPoint and Performance counters (available memory) to ensure there is enough available memory.

C# Production Server, Do I collect the garbage?

I know there's tons of threads about this. And I read a few of them.
I'm wondering if in my case it is correct to GC.Collect();
I have a server for a MMORPG, in production it is online day and night. And the server is restarted every other day to implement changes to the production codebase. Every twenty minutes the server pauses all other threads, and serializes the current game state. This usually takes 0.5 to 4 seconds
Would it be a good idea to GC.Collect(); after serialization?
The server is, obviously, constantly creating and destroying game items.
Would I have a notorious gain in performance or memory optimization / usage?
Should I not manually collect?
I've read about how collecting can be bad if used in the wrong moments or too frequently, but I'm thinking these saves are both a good moment to collect, and not that frequent.
The server is in framework 4.0
Update in answer to a comment:
We are randomly experiencing server freezes, sometimes, unexpectedly, the server memory usage will raise increasingly until it reaches a point when the server takes way too long to handle any network operation. Thus, I'm considering a lot of different approaches to solve the issue, this is one of them.
The garbage collector knows best when to run, and you shouldn't force it.
It will not improve performance or memory optimization. CLR can tell GC to collect object which are no longer used if there is a need to do that.
Answer to an updated part:
Forcing the collection is not a good solution to the problem. You should rather have a look a bit deeper into your code to find out what is wrong. If memory usage grows unexpectedly you might have an issue with unmanaged resources which are not properly handled or even a "leaky code" within managed code.
One more thing. I would be surprise if calling GC.Collect fixed the problem.
Every twenty minutes the server pauses
all other threads, and serializes the
current game state. This usually takes
0.5 to 4 seconds
If all your threads are suspended already anyway you might as well call the garbage collection, since it should be fairly fast at this point. I suspect doing this will only mask your real problem though, not actually solve it.
We are randomly experiencing server
freezes, sometimes, unexpectedly, the
server memory usage will raise
increasingly until it reaches a point
when the server takes way too long to
handle any network operation. Thus,
I'm considering a lot of different
approaches to solve the issue, this is
one of them.
This sounds more like you actually are still referencing all these objects that use the memory - if you weren't the GC would run due to the memory pressure and try to release those objects. You might be looking at an actual bug in your production code (i.e. objects that are still subscribed to events or otherwise are being referenced when they shouldn't be) rather than something you can fix by manually taking out the garbage.
If possible in this scenario you should run a performance analysis to see where your bottlenecks are and what part of your code is causing the brunt of the memory allocations.
Could the memory increase be an "attack" by a player with a fake/modified game-client? Is a lot of memory allocated by the server when it accepts a new client connection? Does the server handle bogus incoming data well?

How to prevent or minimize the negative effects of .NET GC in a real time app?

Are there any tips, tricks and techniques to prevent or minimize slowdowns or temporary freeze of an app because of the .NET GC?
Maybe something along the lines of:
Try to use structs if you can, unless the data is too large or will be mostly used inside other classes, etc.
The description of your App does not fit the usual meaning of "realtime". Realtime is commonly used for software that has a max latency in milliseconds or less.
You have a requirement of responsiveness to the user, meaning you could probably tolerate an incidental delay of 500 ms or more. 100 ms won't be noticed.
Luckily for you, the GC won't cause delays that long. And if it did you could use the Server (background) version of the GC, but I know little about the details.
But if your "user experience" does suffer, it probably won't be the GC.
IMHO, if the performance of your application is being affected noticeably by the GC, something is wrong. The GC is designed to work without intervention and without significantly affecting your application. In other words, you shouldn't have to code with the details of the GC in mind.
I would examine the structure of your application and see where the bottlenecks are, maybe using a profiler. Maybe there are places where you could reduce the number of objects that are being created and destroyed.
If parts of your application really need to be real-time, perhaps they should be written in another language that is designed for that sort of thing.
Another trick is to use GC.RegisterForFullNotifications on back-end.
Let say, that you have load balancing server and N app. servers. When load balancer recieves information about possible full GC on one of the servers it will forward requests to other servers for some time therefore SLA will not be affected by GC (which is especially usefull for x64 boxes where more than 4GB can be addressed).
Updated
No, unfortunately I don't have a code but there is a very simple example at MSDN.com with dummy methods like RedirectRequests and AcceptRequests which can be found here: Garbage Collection Notifications

Why does running out of memory depend on intermediate calls to GC.GetTotalMemory?

A memory intensive program that I wrote ran out of memory: threw an OutOfMemory exception. During attempts to reduce memory usage, I started calling GC.GetTotalMemory(true) (to write the total memory usage to debug file), which triggers a garbage collect.
For some reason, when calling this function I don't get an out of memory exception anymore. If I remove the calls again (keeping everything else the same), the exception gets thrown again. In my understanding, calls are automatically made to collect garbage when memory pressure increases, so I don't understand this behavior.
Can anyone explain why the out of memory exception is only thrown when there are no calls to GC.collect?
Update:
I'm using VS 2010, but I'm downtargeting the application to framework 3.5. I believe that defragmentation is indeed causing my problems.
I did some tests: When the exception is thrown, a call to GC.gettotalmemory tells me I am using ~800 * 10^6 bytes. However, task manager tells me the application is using 1700 mb. A rather large discrepancy. I'm now planning to allocate memory only once, and to never deallocate any large arrays but reusing them. Luckily, my program allows me to accomplish this without too much fuss.
I solved the problem by doing some smarter memory management. In particular by using a CustomList according to the suggestions on http://www.simple-talk.com/dotnet/.net-framework/the-dangers-of-the-large-object-heap/
Is your app running at full CPU? I'm pretty sure automatic garbage collection only occurs when the application is idle. Otherwise, you have to run a manual cycle.
I'm fairly sure that running out of memory does not force a garbage collection. That probably sounds incredibly unintuitive to you but I think this was done for a good reason. It prevents the program from entering a death-spiral where it constantly tries to find more space and getting all objects firmly lodged into gen #2. From which it is very hard to recover again.
The true argument you pass to GetTotalMemory() forces a full garbage collection. I would guess that this happens to free up enough space in the Large Object Heap to satisfy the memory allocation. This will of course work only once. If your program just keeps running, gobbling up memory beyond the 1.5 gigabytes or so that it has already consumed then OOM is just around the corner again. This time without any way to recover. Surviving an OOM requires drastic measures.
You'll need a good memory profiler to find out what's really going on. Unmanaged C++ in your project is always a fertile source of memory leaks. The unmanaged kind, always hard to trouble-shoot.
GC.Collect is merely a "suggestion" to free unused memory - it does not guarantee its release.
[Edit]
It appears that, while once true when I was learning the JVM years ago, this may not be the case in .NET anymore. The MSDN Library says that GC.Collect "Forces an immediate garbage collection of all generations." Good stuff (for me, anyway) about this here.
If you have unmanaged resources that occupy a lot of memory, the garbage collector won't really recognize that memory pressure. If you clean up those resources in finalizers, then forcing a collection will result in those unmanaged resources being freed, while if you don't force a collection, the garbage collector might not realize that it needs to be collecting.
If you are performing large unmanaged allocations, you can use GC.AddMemoryPressure to tell the GC that, so it can take it into account when deciding whether to run a collection or not.
I was browsing around the Microsoft Connect site and I am seeing bug reports where people are making the same claim you are. The claim being that an OutOfMemoryException is occuring which can be resolved by periodically calling GC.Collect. I saw one report where the lead engineer from the garbage collector team responded back and said a bug was fixed in .NET 4.0 that should resolve a fragmentation issue with the large object heap. That is why I asked what version you were using.
It is certainly possible that you have stumbled upon a bug in the garbage collector. As with all GC related issues this could be very version dependent.
My advice would be to:
make sure you have the latest patches and service pack
refactor the code so that it is not as memory intensive
reuse LOH objects as much as possible instead of creating new ones
continue using GC.Collect at strategic points if necessary as a workaround

When is it OK to catch an OutOfMemoryException and how to handle it?

Yesterday I took part in a discussion on SO devoted to OutOfMemoryException and the pros and cons of handling it (C# try {} catch {}).
My pros for handling it were:
The fact that OutOfMemoryException was thrown doesn't generally mean that the state of a program was corrupted;
According to documentation "the following Microsoft intermediate (MSIL) instructions throw OutOfMemoryException: box, newarr, newobj" which just (usually) means that the CLR attempted to find a block of memory of a given size and was unable to do that; it does not mean that no single byte left at our disposition;
But not all people were agree with that and speculated about unknown program state after this exception and an inability to do something useful since it will require even more memory.
Therefore my question is: what are the serious reasons not to handle OutOfMemoryException and immediately give up when it occurs?
Edited: Do you think that OOME is as fatal as ExecutionEngineException?
IMO, since you can't predict what you can/can't do after an OOM (so you can't reliably process the error), or what else did/didn't happen when unrolling the stack to where you are (so the BCL hasn't reliably processed the error), your app must now be assumed to be in a corrupt state. If you "fix" your code by handling this exception you are burying your head in the sand.
I could be wrong here, but to me this message says BIG TROUBLE. The correct fix is to figure out why you have chomped though memory, and address that (for example, have you got a leak? could you switch to a streaming API?). Even switching to x64 isn't a magic bullet here; arrays (and hence lists) are still size limited; and the increased reference size means you can fix numerically fewer references in the 2GB object cap.
If you need to chance processing some data, and are happy for it to fail: launch a second process (an AppDomain isn't good enough). If it blows up, tear down the process. Problem solved, and your original process/AppDomain is safe.
We all write different applications. In a WinForms or ASP.Net app I would probably just log the exception, notify the user, try to save state, and shutdown/restart. But as Igor mentioned in the comments this could very well be from building some form of image editing application and the process of loading the 100th 20MB RAW image could push the app over the edge. Do you really want the use to lose all of their work from something as simple as saying. "Sorry, unable to load more images at this time".
Another common instance that it could be useful to catch out of memory exceptions is in back end batch processing. You could have a standard model of loading multi-mega-byte files into memory for processing, but then one day out of the blue a multi-giga-byte file is loaded. When the out-of-memory occurs you could log the message to a user notification queue and then move on to the next file.
Yes it is possible that something else could blow at the same time, but those too would be logged and notified if possible. If finally the GC is unable to process any more memory the application is going to go down hard anyway. (The GC runs in an unprotected thread.)
Don't forget we all develop different types of applications. And unless you are on older, constrained machines you will probably never get an OutOfMemoryException for typical business apps... but then again not all of us are business tool developers.
To your edit...
Out-of-memory may be caused by unmanaged memory fragmentation and pinning. It can also be caused by large allocation requests. If we were to put up a white flag and draw a line in the sand over such simple issues, nothing would ever get done in large data processing projects. Now comparing that to a fatal Engine exception, well there is nothing you can do at the point the runtime falls over dead under your code. Hopefully you are able to log (but probably not) why your code fell on its face so you can prevent it in the future. But, more importantly, hopefully your code is written in a manner that could allow for safe recovery of as much data as you can. Maybe even recover the last known good state in your application and possibly skip the offending corrupt data and allow it to be manually processed and recovered.
Yet at the same time it is just as possible to have data corruption caused by SQL injection, out-of-sync versions of software, pointer manipulation, buffer over runs, and many other problems. Avoiding an issue just because you think you may not recover from it is a great way to give users error messages as constructive as Please contact your system administrator.
Some commenters have noted that there are situations, when OOM could be the immediate result of attempting to allocate a large number of bytes (graphics application, allocating large array, etc.). Note that for that purpose you could use the MemoryFailPoint class, which raises an InsufficientMemoryException (itself derived from OutOfMemoryException). That can be caught safely, as it is raised before the actual attempt to allocate the memory has been made. However, this can only really reduce the likelyness of an OOM, never fully prevent it.
It all depends on the situation.
Quite a few years ago now I was working on a real-time 3D rendering engine. At the time we loaded all the geometry for the model into memory on start up, but only loaded the texture images when we needed to display them. This meant when the day came our customers were loading huge (2GB) models we were able to cope. The geometry occupied less than 2GB, but when all the textures were added it would be > 2GB. By trapping the out of memory error that was raised when we tried to load the texture we were able to carry on displaying the model, but just as the plain geometry.
We still had a problem if the geometry was > 2GB, but that was a different story.
Obviously, if you get an out of memory error with something fundamental to your application then you've got no choice but to shut down - but do that as gracefully as you can.
Suggest Christopher Brumme's comment in "Framework Design Guideline" p.238 (7.3.7 OutOfMemoryException):
At one end of the spectrum, an OutOfMemoryException could be the result of a failure to obtain 12 bytes for implicitly autoboxing, or a failure to JIT some code that is required for critical backout. These cases are catastrophic failures and ideally would result in termination of the process. At the other end of the spectrum, an OutOfMemoryException could be the result of a thread asking for a 1 GB byte array. The fact that we failed this allocation attempt has no impact on the consistency and viability of the rest of the process.
The sad fact is that CRL 2.0 cannot distinguish among any points on this spectrum. In most managed processes, all OutOfMemoryExceptions are considered equivalent and they all result in a managed exception being propagated up the thread. However, you cannot depend on your backout code being executed, because we might fail to JIT some of your backout methods, or we might fail to execute static constructors required for backout.
Also, keep in mind that all other exceptions can get folded into an OutOfMemoryException if there isn't enough memory to instantiate those other exception objects. Also, we will give you a unique OutOfMemoryException with its own stack trace if we can. But if we are tight enough on memory, you will share an uninteresting global instance with everyone else in the process.
My best recommendation is that you treat OutOfMemoryException like any other application exception. You make your best attempts to handle it and ramain consistent. In the future, I hope the CLR can do a better job of distinguishing catastrophic OOM from the 1 GB byte array case. If so, we might provoke termination of the process for the catastrophic cases, leaving the application to deal with the less risky ones. By threating all OOM cases as the less risky ones, you are preparing for that day.
Marc Gravell has already provided an excellent answer; seeing as how I partly "inspired" this question, I would like to add one thing:
One of the core principles of exception handling is never to throw an exception inside an exception handler. (Note - re-throwing a domain-specific and/or wrapped exception is OK; I am talking about an unexpected exception here.)
There are all sorts of reasons why you need to prevent this from happening:
At best, you mask the original exception; it becomes impossible to know for sure where the program originally failed.
In some cases, the runtime may simply be unable to handle an unhandled exception in an exception handler (say that 5 times fast). In ASP.NET, for example, installing an exception handler at certain stages of the pipeline and failing in that handler will simply kill the request - or crash the worker process, I forget which.
In other cases, you may open yourself up to the possibility of an infinite loop in the exception handler. This may sound like a silly thing to do, but I have seen cases where somebody tries to handle an exception by logging it, and when the logging fails... they try to log the failure. Most of us probably wouldn't deliberately write code like this, but depending on how you structure your program's exception handling, you can end up doing it by accident.
So what does this have to do with OutOfMemoryException specifically?
An OutOfMemoryException doesn't tell you anything about why the memory allocation failed. You might assume that it was because you tried to allocate a huge buffer, but maybe it wasn't. Maybe some other rogue process on the system has literally consumed all of the available address space and you don't have a single byte left. Maybe some other thread in your own program went awry and went into an infinite loop, allocating new memory on each iteration, and that thread has long since failed by the time the OutOfMemoryException ends up on your current stack frame. The point is that you don't actually know just how bad the memory situation is, even if you think you do.
So start thinking about this situation now. Some operation just failed at an unspecified point deep in the bowels of the .NET framework and propagated up an OutOfMemoryException. What meaningful work can you perform in your exception handler that does not involve allocating more memory? Write to a log file? That takes memory. Display an error message? That takes even more memory. Send an alert e-mail? Don't even think about it.
If you try to do these things - and fail - then you'll end up with non-deterministic behaviour. You'll possibly mask the out-of-memory error and get mysterious bug reports with mysterious error messages bubbling up from all kinds of low-level components you wrote that aren't supposed to be able to fail. Fundamentally, you've violated your own program's invariants, and this is going to be a nightmare to debug if your program ever does end up running under low-memory conditions.
One of the arguments presented to me before was that you might catch an OutOfMemoryException and then switch to lower-memory code, like a smaller buffer or a streaming model. However, this "Expection Handling" is a well-known anti-pattern. If you know you're about to chew up a huge amount of memory and aren't sure whether or not the system can handle it, then check the available memory, or better yet, just refactor your code so that it doesn't need so much memory all at once. Don't rely on the OutOfMemoryException to do it for you, because - who knows - maybe the allocation will just barely succeed and trigger a bunch of out-of-memory errors immediately after your exception handler (possibly in some completely different component).
So my simple answer to this question is: Never.
My weasel-answer to this question is: It's OK in a global exception handler, if you're really really careful. Not in a try-catch block.
One practical reason for catching this exception is to attempt a graceful shutdown, with a friendly error message instead of an exception trace.
The problem is larger than .NET. Almost any application written from the fifties to now has big problems if no memory is available.
With virtual address spaces the problem has been sort-of salvaged but NOT solved because even address spaces of 2GB or 4GB may become too small. There are no commonly available patterns to handle out-of-memory. There could be an out-of-memory warning method, a panic method etc. that is guaranteed to still have memory available.
If you receive an OutOfMemoryException from .NET almost anything may be the case. 2 MB still available, just 100 bytes, whatever. I wouldn't want to catch this exception (except to shutdown without a failure dialog). We need better concepts. Then you may get a MemoryLowException where you CAN react to all sorts of situations.
The problem is that - in contrast to other Exceptions - you usually have a low memory situation when the exception occurs (except when the memory to be allocated was huge, but you don't really know when you catch the exception).
Therefore, you must be very careful not to allocate memory when handling this exception. And while this sounds easy it's not, actually it's very hard to avoid any memory allocation and do something useful. Therefore, catching it is usually not a good idea IMHO.
Write code, don't hijack the JVM. When VM is humbly telling you that a memory allocation request failed your best bet is to discard the state of application to avert corrupting application data. Even if you decide to catch OOM you should only try to gather diagnostic information like dumping log, stacktrace etc. Please do not try to initiate a backout procedure as you are not sure whether it will get a chance to execute or not.
Real world analogy: You are traveling in a plane and all engines fail. What would you do after catching a AllEngineFailureException ? Best bet is to grab the mask and prepare for a crash.
When in OOM, dump!!

Categories

Resources