Are CER needed to merely protect shared managed states within an AppDomain? - c#

I do have an operation that must be reliably performed as a whole or not be performed at all.
The goal is only to preserve the consistency of some in-memory managed shared states.
Those states are contained within an application domain. They are not visible outside of this domain.
I therefore do not have to react when the domain or the process are teared down.
I am writing a class library and the user may call my code from anywhere. However my code does not call any user code, not even virtual methods.
The CLR may be hosted.
To my understanding I do not need constrained execution regions (CER) since:
CER are only needed against the infamous OutOfMemoryException, ThreadAbortException and StackOverflowException.
My code does not make any allocation, so I do not care about OutOfMemory (anyway allocations must not be done within a CER).
If a stack overflow occurs the process will be teared down anyway (or the domain in some hosted scenarios).
Thread aborts are already delayed until the end of a finally block and my code is already within one.
Am I correct on those points? Do you see other reasons why I should need CER?

I finally found at least one reason why a CER is still needed: even if my code does not do any allocation, the JIT compiler may have to allocate memory on the first execution.
Therefore putting a CER is required to force the runtime to JIT everything beforehand and prevent a possible OOM.

Related

Can there be a scenario when garbage collector fails to run due to an exception?

Just out of curiosity I was wondering if there is a possibility of a scenario when garbage collector fails to run or doesn't run at all (possibly due to an exception) ?
If yes, most probably there would be an OutOfMemory/ Stackoverflow exception . Then in that case just by looking at the exception message, stacktrace etc can we identify the core issue of gc failing to run.
As others have mentioned, numerous things can prevent the GC from running. FailFast fails fast; it doesn't stop to take out the trash before the building is demolished. But you asked specifically about exceptions.
An uncaught exception produces implementation-defined behaviour, so it is implementation-defined whether finally blocks run, whether garbage collection runs, and whether the finalizer queue objects are finalized when there is an uncaught exception. An implementation of the CLR is permitted to do anything when that happens, and "anything" includes both "run the GC" and "do not run the GC". And in fact implementations of the CLR have changed their behaviour over time; in v1.0 of the CLR an uncaught exception on the finalizer thread took out the process, in v2.0 an uncaught exception on the finalizer thread is caught, the error is logged, and finalizers keep on running.
There are four questions of interest:
Can something cause the program to die entirely, without the garbage-collector getting a chance to run
Can something prevent the garbage-collector from running without causing the system to die entirely
Can something prevent objects' finalizers from running without causing the system to die entirely
Can an exception make an object uncollectable for an arbitrary period of time
With regard to the first one, the answer is "definitely". There are so many ways that could potentially happen, that there's no need to list them here.
With regard to the second question, the answer is "generally no", since failure of the garbage collector would cripple a program; there may be some cases, however, in which portions of a program which do not use GC-managed memory may be able to keep running even though the portions that use managed objects could be blocked indefinitely.
With regard to the third question, it used to be in .net that an exception in a finalizer could interfere with the action of other finalizers without killing the entire application; such behavior has been changed since .net 2.0 so that uncaught exceptions thrown from finalizers will usually kill the whole program. It is possible, however, that an exception which is thrown and caught within a poorly-written finalizer might result in its failing to clean up everything it was supposed to, leading to question #4.
With regard to the fourth question, it is quite common for objects to establish long-lived (possibly static) references to themselves when they are created, and for them to destroy such references as part of clean-up code. If an exception prevents that clean-up code from running as expected, it may cause the objects to become uncollectable even if they are no longer useful.
yes, in Java there used to be the situation where the program could stop without the GC being run for the last time - in most cases this is OK as all the memory is cleared up when the program's heap is destroyed, but you can have the problem of objects not having their finalisers being run, this may or may not be a problem for you, depending what those finalisers would do.
I doubt you'll be able to determine the GC failure, as the program will be as dead as a parrot, in a non-clean manner, so you probably won't even get a stacktrace. You might be able to post-mortem debug it (if you've turned on the right dbg settings, .NET is sh*te when it comes to working nicely with the excellent Windows debugging tools).
There are certain edge cases where a finally block will not execute - calling FailFast is one case, and see the question here for others.
Given this, I would imagine there are cases (especially in using statements / IDisposable objects) where the resource cleanup/garbage collection occurring in a finally block are not executed.
More explicitly, something like this:
try
{
//new up an expensive object, maybe one that uses native resources
Environment.FailFast(string.Empty);
}
finally
{
Console.WriteLine("never executed");
}

Chronic inappropriate System.OutOfMemoryException occurrences

Here's a problem with what should be a continuously-running, unattended console app: I'm seeing too-frequent app exits from System.OutOfMemoryException being thrown from a wide variety of methods deep in the call stack -- often System.String.ToCharArray(), or System.String.CtorCharArrayStartLength(), or System.Xml.XmlTextReaderImpl.InitTextReaderInput(), but sometimes down in a System.String.Concat() call in a MongoCollection.Save() call's stack, and other unlikely places.
For what it's worth, we're using parallel tasks, but this is essentially the only app running on the server, and the app's total thread count never gets over 60. In some of these cases I know of a reason for some other exception to be thrown, but OutOfMemoryException makes no sense in these contexts, and it creates problems:
According to TaskManager and Perfmon logs, the system has had a minimum of 65% out of 8GB free memory when this has happened, and
While exception handlers sometimes fire & log the exception, they do not prevent an app crash, and
There's no continuing from this exception without user interaction (unless you suppress windows error reporting, which isn't what we want system-wide, or run the app as a service, which is possible but sub-optimal for our use-case)
So I'm aware of the workarounds mentioned above, but what I'd really like is some explanation -- and ideally a code-based handler -- for the unexpected OOM exceptions, so that we can engage appropriate continuation logic. Any ideas?
Getting that exception when using under 3GB of memory suggests that you are running a 32-bit app. Build it as a 64-bit app and it will be able to use as much memory as is available (close to 8GB).
As to why it's failing in the first place...how large is the data are you working with? If it's not very large, have you looked for references to data being kept around much longer than they are necessary (i.e. a memory leak), thus preventing proper GC?
You need to profile the application, but the most common reason for these exceptions is excessive string creation. Also, excessive serialization can cause this and excessive Xslt transformations.
Do you have a lot of objects larger or equal to 85000 bytes? Every such object will go to the Large Object Heap, which is not compacted. I.e. unlike Small Object Heap, GC will not move objects around to fill the memory holes, which can lead to fragmentation, which is a potential problem for long-lived applications.
As of .NET 4, this is still the case, but it seems they made some improvements in .NET 4.5.
A quick-and-dirty workaround is to make sure the application can use all the available memory by building it as "x64" or "Any CPU", but the real solution would be to minimize repeated allocation/deallocation cycles of large objects (i.e. use object pooling or avoid large objects altogether, if possible).
You might also want to look at this.

Corrupted state exceptions (CSE) across AppDomain

For some background info, .NET 4.0 no longer catches CSEs by default: http://msdn.microsoft.com/en-us/magazine/dd419661.aspx
I'm working on an app that executes code in a new AppDomain. If that code throws a CSE, the exception bubbles up to the main code if it's not handled. My question is, can I safely assume that a CSE on the second AppDomain won't corrupt the state in the main AppDomain, and thus exit the second AppDomain and continue running the main AppDomain?
In the context of a corrupted state exception, in general, you cannot assume anything to be true anymore. The point of these exceptions is that something has happened, usually due to buggy unmanaged code, that has violated some core assumption that Windows or the CLR makes about the structure of memory. That means that, in theory, the very structures that the CLR uses to track which app domains exist in memory could be corrupted. The kinds of things that cause CSEs are generally indicative that things have gone catastrophically wrong.
Having said all that, off-the-record, in some cases, you may be able to make a determination that it is safe to continue from a particular exception. An EXCEPTION_STACK_OVERFLOW, for example, is probably recoverable, and an EXCEPTION_ACCESS_VIOLATION usually indicates that Windows caught a potential bug before it had a chance to screw anything up. It's up to you if you're willing to risk it, depending on how much you know about the code that is throwing the CSEs in the first place.

GC Behavior and CLR Thread Hijacking

I was reading about the GC in the book CLR via C#, specifically about when the CLR wants to start a collection. I understand that it has to suspend the threads before a collection occurs, but it mentions that it has to do this when the thread instruction pointer reaches a safe point. In the cases where it's not in a safe point, it tries to get to one quickly, and it does so by hijacking the thread (inserting a special function pointer in the thread stack). That's all fine and dandy, but I thought managed threads by default were safe?
I had initially thought it might have been referring to unmanaged threads, but the CLR lets unmanaged threads continue executing because any object being used should have been pinned anyway.
So, what is a safe point in a managed thread, and how can the GC determine what that is?
EDIT:
I don't think I was being specific enough. According to this MSDN article, even when Thread.Suspend is called, the thread will not actually be suspended until a safe point is reached. It goes on to further state that a safe point is a point in a threads execution at which a garbage collection can be performed.
I think I was unclear in my question. I realize that a Thread can only be suspended at a safe point and they have to be suspended for a GC, but I can't seem to find a clear answer as to what a safe point is. What determines a point in code as being safe?
'Safe Points' are where we are:
Not in a catch block.
Not inside a finally
Not inside a lock
Not inside p/invoke'd call (in managed code). Not running unmanaged code in the CLR.
The memory tree is walkable.
Point #5 is a bit confusing, but there are times when the memory tree will not be walkable. For example, after optimization, the CLR may new an Object and not assign it directly to a variable. According to the GC, this object would be a dead object ready to be collected. The compiler will instruct the GC when this happens to not run GC yet.
Here's a blog post on msdn with a little bit more information: http://blogs.msdn.com/b/abhinaba/archive/2009/09/02/netcf-gc-and-thread-blocking.aspx
EDIT: Well, sir, I was WRONG about #4. See here in the 'Safe Point' section. If we are inside a p/invoke (unmanaged) code section then it is allowed to run until it comes back out to managed code again.
However, according to this MSDN article, if we are in an unmanaged portion of CLR code, then it is not considered safe and they will wait until the code returns to managed. (I was close, at least).
Actually none of the answers I found so far on SO explains the 'why', i.e. what makes a certain point in code unsafe. And for that, from what've read in "Pro .NET Memory Management", the answer seems to be: in principle every point in code can be safe point as long as there're GCInfo generated by JIT to fully describe GC roots for that given point in code.
However, it's both impractical (think about the memory overhead, we're talking about GCInfo for every CPU instruction) and unnecessary (because what matters really is the "time-to-safe-point" (TTSP), it's sufficient to generate safe points with a granularity that makes this TTSP latency sufficiently small) to generate safe point for every struction.
Therefore, the JIT compiler uses some heuristics to decide how often safe points are generated so that it can tradeoff between memory overhead (not too often), and gc latency due to TTSP delay (not too few). Most of the time it's sufficient to just rely on method call site to act as safe points as they happen frequently enough to make TTSP delay very small. One of the exceptions is tight loop within which no method calls are made, in which case JIT may decide to inject safe points at loop repetition boundary.
So to sum it up, nothing fundamentally makes a particular point in code "unsafe" for GC. It's only a matter of tradeoff by JIT to decide how often safe-points are inserted.

When is it OK to catch an OutOfMemoryException and how to handle it?

Yesterday I took part in a discussion on SO devoted to OutOfMemoryException and the pros and cons of handling it (C# try {} catch {}).
My pros for handling it were:
The fact that OutOfMemoryException was thrown doesn't generally mean that the state of a program was corrupted;
According to documentation "the following Microsoft intermediate (MSIL) instructions throw OutOfMemoryException: box, newarr, newobj" which just (usually) means that the CLR attempted to find a block of memory of a given size and was unable to do that; it does not mean that no single byte left at our disposition;
But not all people were agree with that and speculated about unknown program state after this exception and an inability to do something useful since it will require even more memory.
Therefore my question is: what are the serious reasons not to handle OutOfMemoryException and immediately give up when it occurs?
Edited: Do you think that OOME is as fatal as ExecutionEngineException?
IMO, since you can't predict what you can/can't do after an OOM (so you can't reliably process the error), or what else did/didn't happen when unrolling the stack to where you are (so the BCL hasn't reliably processed the error), your app must now be assumed to be in a corrupt state. If you "fix" your code by handling this exception you are burying your head in the sand.
I could be wrong here, but to me this message says BIG TROUBLE. The correct fix is to figure out why you have chomped though memory, and address that (for example, have you got a leak? could you switch to a streaming API?). Even switching to x64 isn't a magic bullet here; arrays (and hence lists) are still size limited; and the increased reference size means you can fix numerically fewer references in the 2GB object cap.
If you need to chance processing some data, and are happy for it to fail: launch a second process (an AppDomain isn't good enough). If it blows up, tear down the process. Problem solved, and your original process/AppDomain is safe.
We all write different applications. In a WinForms or ASP.Net app I would probably just log the exception, notify the user, try to save state, and shutdown/restart. But as Igor mentioned in the comments this could very well be from building some form of image editing application and the process of loading the 100th 20MB RAW image could push the app over the edge. Do you really want the use to lose all of their work from something as simple as saying. "Sorry, unable to load more images at this time".
Another common instance that it could be useful to catch out of memory exceptions is in back end batch processing. You could have a standard model of loading multi-mega-byte files into memory for processing, but then one day out of the blue a multi-giga-byte file is loaded. When the out-of-memory occurs you could log the message to a user notification queue and then move on to the next file.
Yes it is possible that something else could blow at the same time, but those too would be logged and notified if possible. If finally the GC is unable to process any more memory the application is going to go down hard anyway. (The GC runs in an unprotected thread.)
Don't forget we all develop different types of applications. And unless you are on older, constrained machines you will probably never get an OutOfMemoryException for typical business apps... but then again not all of us are business tool developers.
To your edit...
Out-of-memory may be caused by unmanaged memory fragmentation and pinning. It can also be caused by large allocation requests. If we were to put up a white flag and draw a line in the sand over such simple issues, nothing would ever get done in large data processing projects. Now comparing that to a fatal Engine exception, well there is nothing you can do at the point the runtime falls over dead under your code. Hopefully you are able to log (but probably not) why your code fell on its face so you can prevent it in the future. But, more importantly, hopefully your code is written in a manner that could allow for safe recovery of as much data as you can. Maybe even recover the last known good state in your application and possibly skip the offending corrupt data and allow it to be manually processed and recovered.
Yet at the same time it is just as possible to have data corruption caused by SQL injection, out-of-sync versions of software, pointer manipulation, buffer over runs, and many other problems. Avoiding an issue just because you think you may not recover from it is a great way to give users error messages as constructive as Please contact your system administrator.
Some commenters have noted that there are situations, when OOM could be the immediate result of attempting to allocate a large number of bytes (graphics application, allocating large array, etc.). Note that for that purpose you could use the MemoryFailPoint class, which raises an InsufficientMemoryException (itself derived from OutOfMemoryException). That can be caught safely, as it is raised before the actual attempt to allocate the memory has been made. However, this can only really reduce the likelyness of an OOM, never fully prevent it.
It all depends on the situation.
Quite a few years ago now I was working on a real-time 3D rendering engine. At the time we loaded all the geometry for the model into memory on start up, but only loaded the texture images when we needed to display them. This meant when the day came our customers were loading huge (2GB) models we were able to cope. The geometry occupied less than 2GB, but when all the textures were added it would be > 2GB. By trapping the out of memory error that was raised when we tried to load the texture we were able to carry on displaying the model, but just as the plain geometry.
We still had a problem if the geometry was > 2GB, but that was a different story.
Obviously, if you get an out of memory error with something fundamental to your application then you've got no choice but to shut down - but do that as gracefully as you can.
Suggest Christopher Brumme's comment in "Framework Design Guideline" p.238 (7.3.7 OutOfMemoryException):
At one end of the spectrum, an OutOfMemoryException could be the result of a failure to obtain 12 bytes for implicitly autoboxing, or a failure to JIT some code that is required for critical backout. These cases are catastrophic failures and ideally would result in termination of the process. At the other end of the spectrum, an OutOfMemoryException could be the result of a thread asking for a 1 GB byte array. The fact that we failed this allocation attempt has no impact on the consistency and viability of the rest of the process.
The sad fact is that CRL 2.0 cannot distinguish among any points on this spectrum. In most managed processes, all OutOfMemoryExceptions are considered equivalent and they all result in a managed exception being propagated up the thread. However, you cannot depend on your backout code being executed, because we might fail to JIT some of your backout methods, or we might fail to execute static constructors required for backout.
Also, keep in mind that all other exceptions can get folded into an OutOfMemoryException if there isn't enough memory to instantiate those other exception objects. Also, we will give you a unique OutOfMemoryException with its own stack trace if we can. But if we are tight enough on memory, you will share an uninteresting global instance with everyone else in the process.
My best recommendation is that you treat OutOfMemoryException like any other application exception. You make your best attempts to handle it and ramain consistent. In the future, I hope the CLR can do a better job of distinguishing catastrophic OOM from the 1 GB byte array case. If so, we might provoke termination of the process for the catastrophic cases, leaving the application to deal with the less risky ones. By threating all OOM cases as the less risky ones, you are preparing for that day.
Marc Gravell has already provided an excellent answer; seeing as how I partly "inspired" this question, I would like to add one thing:
One of the core principles of exception handling is never to throw an exception inside an exception handler. (Note - re-throwing a domain-specific and/or wrapped exception is OK; I am talking about an unexpected exception here.)
There are all sorts of reasons why you need to prevent this from happening:
At best, you mask the original exception; it becomes impossible to know for sure where the program originally failed.
In some cases, the runtime may simply be unable to handle an unhandled exception in an exception handler (say that 5 times fast). In ASP.NET, for example, installing an exception handler at certain stages of the pipeline and failing in that handler will simply kill the request - or crash the worker process, I forget which.
In other cases, you may open yourself up to the possibility of an infinite loop in the exception handler. This may sound like a silly thing to do, but I have seen cases where somebody tries to handle an exception by logging it, and when the logging fails... they try to log the failure. Most of us probably wouldn't deliberately write code like this, but depending on how you structure your program's exception handling, you can end up doing it by accident.
So what does this have to do with OutOfMemoryException specifically?
An OutOfMemoryException doesn't tell you anything about why the memory allocation failed. You might assume that it was because you tried to allocate a huge buffer, but maybe it wasn't. Maybe some other rogue process on the system has literally consumed all of the available address space and you don't have a single byte left. Maybe some other thread in your own program went awry and went into an infinite loop, allocating new memory on each iteration, and that thread has long since failed by the time the OutOfMemoryException ends up on your current stack frame. The point is that you don't actually know just how bad the memory situation is, even if you think you do.
So start thinking about this situation now. Some operation just failed at an unspecified point deep in the bowels of the .NET framework and propagated up an OutOfMemoryException. What meaningful work can you perform in your exception handler that does not involve allocating more memory? Write to a log file? That takes memory. Display an error message? That takes even more memory. Send an alert e-mail? Don't even think about it.
If you try to do these things - and fail - then you'll end up with non-deterministic behaviour. You'll possibly mask the out-of-memory error and get mysterious bug reports with mysterious error messages bubbling up from all kinds of low-level components you wrote that aren't supposed to be able to fail. Fundamentally, you've violated your own program's invariants, and this is going to be a nightmare to debug if your program ever does end up running under low-memory conditions.
One of the arguments presented to me before was that you might catch an OutOfMemoryException and then switch to lower-memory code, like a smaller buffer or a streaming model. However, this "Expection Handling" is a well-known anti-pattern. If you know you're about to chew up a huge amount of memory and aren't sure whether or not the system can handle it, then check the available memory, or better yet, just refactor your code so that it doesn't need so much memory all at once. Don't rely on the OutOfMemoryException to do it for you, because - who knows - maybe the allocation will just barely succeed and trigger a bunch of out-of-memory errors immediately after your exception handler (possibly in some completely different component).
So my simple answer to this question is: Never.
My weasel-answer to this question is: It's OK in a global exception handler, if you're really really careful. Not in a try-catch block.
One practical reason for catching this exception is to attempt a graceful shutdown, with a friendly error message instead of an exception trace.
The problem is larger than .NET. Almost any application written from the fifties to now has big problems if no memory is available.
With virtual address spaces the problem has been sort-of salvaged but NOT solved because even address spaces of 2GB or 4GB may become too small. There are no commonly available patterns to handle out-of-memory. There could be an out-of-memory warning method, a panic method etc. that is guaranteed to still have memory available.
If you receive an OutOfMemoryException from .NET almost anything may be the case. 2 MB still available, just 100 bytes, whatever. I wouldn't want to catch this exception (except to shutdown without a failure dialog). We need better concepts. Then you may get a MemoryLowException where you CAN react to all sorts of situations.
The problem is that - in contrast to other Exceptions - you usually have a low memory situation when the exception occurs (except when the memory to be allocated was huge, but you don't really know when you catch the exception).
Therefore, you must be very careful not to allocate memory when handling this exception. And while this sounds easy it's not, actually it's very hard to avoid any memory allocation and do something useful. Therefore, catching it is usually not a good idea IMHO.
Write code, don't hijack the JVM. When VM is humbly telling you that a memory allocation request failed your best bet is to discard the state of application to avert corrupting application data. Even if you decide to catch OOM you should only try to gather diagnostic information like dumping log, stacktrace etc. Please do not try to initiate a backout procedure as you are not sure whether it will get a chance to execute or not.
Real world analogy: You are traveling in a plane and all engines fail. What would you do after catching a AllEngineFailureException ? Best bet is to grab the mask and prepare for a crash.
When in OOM, dump!!

Categories

Resources