for creating my own memory management in C# I need to have a possibility to intercept the new command before it returns a null or fires an exception. When using the new command I want to call the original handler first. If this handler fails to return a block of memory, I want to inform all my mappable objects to be written to disk and to free memory.
In C++ there has been a possibility to intercept the new command by assigned a different new handler. In C# I couldn't find anything which shows the same behaviour.
Has anyone seen a possibility to do this.
Thanks
Martin
You can't do what you're after in C#, or in any managed language. Nor should you try. The .NET runtime manages allocations and garbage collection. It's impossible for you to instruct your objects to free memory, as you have no guarantee when (or, technically, even if) a particular object will be collected once it's no longer rooted. Even eliminating all references and manually calling GC.Invoke() is not an absolute guarantee. If you're looking for granular memory management, you need to be using a lower-level environment.
As an important point, it is not possible for the new operator to return a null reference. It can only return either a reference to the specified type or throw an exception.
If you want to do your own management of how and when objects are allocated, you'll have to use something along the lines of a factory pattern.
I think you're approaching this from the wrong angle; the whole point of using a runtime with managed memory is so that you don't have to worry about memory. The tradeoff is that you can't do this type of low-level trickery.
As an aside, you can 'override new' for a limited class of objects (those descending from ContextBoundObject) by creating a custom ProxyAttribute, though this likely does not address what you're intending.
I believe that you are not understanding the side-effects of what you're asking for. Even in C++, you can't really do what you think you can do. The reason is simple, if you have run out of memory, you can't even make your objects serialize to disk because you have no memory to accomplish that. By the time memory is exhausted, the only real thing you can do is either discard memory (without saving or doing anything else first) or abend the program.
Now, what you're talking about will still work 95% of the time because your memory allocation will likely be sufficiently large that when it fails, you have a little room to play with, but you can't guarantee that this will be the case.
Example: If you have only 2MB of memory left, and you try to allocate 10MB, then it will fail, and you still have 2MB to play with to try and free up some memory, which will allow you to allocate small chunks of memory needed to serialize objects to disk.
But, if you only have 10 bytes of memory left, then you don't even have enough memory to create a new exception object (unless it comes from a reserved pool). So, in essence, you're creating a very poor situation that will likely crash at some point.
Even in C++ low memory conditions are almost impossible to get right, and it's almost impossible to recover from every case unless you have very carefully planned, and pre-allocated memory for your recovery routines.
Now, when you're talking about a garbage collected OS, you have no control over how memory is allocated or freed. At best, all you can do is give hints. There is very little you can reliably do here by the nature of garbage collection. It's non-deterministic.
Related
I need to dispose of an object so it can release everything it owns, but it doesn't implement the IDisposable so I can't use it in a using block. How can I make the garbage collector collect it?
You can force a collection with GC.Collect(). Be very careful using this, since a full collection can take some time. The best-practice is to just let the GC determine when the best time to collect is.
Does the object contain unmanaged resources but does not implement IDisposable? If so, it's a bug.
If it doesn't, it shouldn't matter if it gets released right away, the garbage collector should do the right thing.
If it "owns" anything other than memory, you need to fix the object to use IDisposable. If it's not an object you control this is something worth picking a different vendor over, because it speaks to the core of how well your vendor really understands .Net.
If it does just own memory, even a lot of it, all you have to do is make sure the object goes out of scope. Don't call GC.Collect() — it's one of those things that if you have to ask, you shouldn't do it.
You can't perform garbage collection on a single object. You could request a garbage collection by calling GC.Collect() but this will effect all objects subject to cleanup. It is also highly discouraged as it can have a negative effect on the performance of later collections.
Also, calling Dispose on an object does not clean up it's memory. It only allows the object to remove references to unmanaged resources. For example, calling Dispose on a StreamWriter closes the stream and releases the Windows file handle. The memory for the object on the managed heap does not get reclaimed until a subsequent garbage collection.
Chris Sells also discussed this on .NET Rocks. I think it was during his first appearance but the subject might have been revisited in later interviews.
http://www.dotnetrocks.com/default.aspx?showNum=10
This article by Francesco Balena is also a good reference:
When and How to Use Dispose and Finalize in C#
http://www.devx.com/dotnet/Article/33167/0/page/1
Garbage collection in .NET is non deterministic, meaning you can't really control when it happens. You can suggest, but that doesn't mean it will listen.
Tells us a little bit more about the object and why you want to do this. We can make some suggestions based off of that. Code always helps. And depending on the object, there might be a Close method or something similar. Maybe the useage is to call that. If there is no Close or Dispose type of method, you probably don't want to rely on that object, as you will probably get memory leaks if in fact it does contain resourses which will need to be released.
If the object goes out of scope and it have no external references it will be collected rather fast (likely on the next collection).
BEWARE: of f ra gm enta tion in many cases, GC.Collect() or some IDisposal is not very helpful, especially for large objects (LOH is for objects ~80kb+, performs no compaction and is subject to high levels of fragmentation for many common use cases) which will then lead to out of memory (OOM) issues even with potentially hundreds of MB free. As time marches on, things get bigger, though perhaps not this size (80 something kb) for LOH relegated objects, high degrees of parallelism exasperates this issue due simply due to more objects in less time (and likely varying in size) being instantiated/released.
Array’s are the usual suspects for this problem (it’s also often hard to identify due to non-specific exceptions and assertions from the runtime, something like “high % of large object heap fragmentation” would be swell), the prognosis for code suffering from this problem is to implement an aggressive re-use strategy.
A class in Systems.Collections.Concurrent.ObjectPool from the parallel extensions beta1 samples helps (unfortunately there is not a simple ubiquitous pattern which I have seen, like maybe some attached property/extension methods?), it is simple enough to drop in or re-implement for most projects, you assign a generator Func<> and use Get/Put helper methods to re-use your previous object’s and forgo usual garbage collection. It is usually sufficient to focus on array’s and not the individual array elements.
It would be nice if .NET 4 updated all of the .ToArray() methods everywhere to include .ToArray(T target).
Getting the hang of using SOS/windbg (.loadby sos mscoreei for CLRv4) to analyze this class of issue can help. Thinking about it, the current garbage collection system is more like garbage re-cycling (using the same physical memory again), ObjectPool is analogous to garbage re-using. If anybody remembers the 3 R’s, reducing your memory use is a good idea too, for performance sakes ;)
Our system keeps hold of lots of large objects for performance. However, when running low on memory, we want to drop some of the objects. The objects are prioritized, so I know which ones to drop. Is there a simple way of determining when to free memory? Also, dropping 1 object may not be enough, so I guess I need a loop to drop, check, drop again if necessary, etc. But in c#, I won't necessarily see the effect immediately of dropping an object, so how do I avoid kicking too much stuff out?
I guess it's just a simple function of used vs total physical & virtual memory. But what function?
Edit: Some clarifications
"Large objects" was misleading. I meant logical "package" of objects (the objects should be small enough individually to avoid the LOB - that's the intention certainly) that together are large (~ 100MB?)
A request can come in which requires the use of one such package. If it is in memory, the response is rapid. If not, it needs to be reconstructed, which is very slow. So I want to keep stuff in memory as long as possible, but can ditch the least requested ones when necessary.
We have no sensible way to serialize these packages. We should probably do that, but it's a lot of work and there's a lot of resistance to doing so.
Our original simple approach is to periodically compare the following to a configurable threshold.
var c = new ComputerInfo();
return c.AvailablePhysicalMemory / c.TotalPhysicalMemory;
There're a lot of different topics on this questions and I think is best to clarify them before actually answering.
First of, you say your app does get a hold of a lot of "large objects". Define large object. Anything larger than about 85K goes into the LOH which only gets collected as part of a generation 2 collection (the most expensive of them all), anything smaller than that, even if you think is a "big" object, is not and it's treated as any other kind of object.
Secondly there're two problems in terms of "managing memory"
One is managing the amount of space you're using inside your virtual memory space. That is, in 32 bit systems making sure you can address all the memory you're asking for, which in Windows 32 bit uses to be around 1,5 GB.
Secondly is managing disposing of that memory when it's needed, which is a part of the garbage collector work so that it triggers when there's a shortage on memory (although that doesn't mean you can't get an OutOfMemoryException if you don't give the GC time enough to do its job).
With that said, I think you should forget about taking the place of the GC... just let it do its job and, if you're worried then find the critical paths that may fail (on memory request) and protect yourself against OutOfMemoryExceptions.
There're a lot of different patterns for handling the case you're posting and most of them really depend on your business scenario. One example is having a state machine that can actually go to an "OutOfMemory" state, in which case the system switches to freeing memory before doing anything else (that includes disposing old objects and invoking the GC to clean everything up, all while you patiently wait for it to happen).
Other techniques involve saving the data to the disk and then manually swapping in and out objects based on some algorithm when you reach certain levels. That means stopping all your threads (or some, depending on business) and moving the data back and forth.
If your large objects are all controlled in terms of location you can also declare a facade over their creation, so that the facade can check whether it needs to free objects or not based on the amount of memory (virtual memory) your process is using. BTW, use the PerformanceInfo API call as quoted in the other answer as this will include the amount of memory used by unmanaged code, which is, nonetheless, located inside the virtual memory space of your process.
Don't worry too much about "real" memory, as the operating system will make sure the most appropriate pages are located in memory.
Then there're hundreds of other optimizations that completely depend on your business scenario. For example databases "know" to bring data to memory depending on the query and predicting the data you're going to use in advance so the data is ready and they do remove objects that are not used... but that's another topic.
Edit: Based on your edits to the question.
Checking memory in the facade will not add a significant overhead in terms of performance.
If you start getting low on memory you should take a decision of how many objects / how much space are you going to free. Don't do it one at a time, take a bunch of them and free enough memory so that you don't have to collect again.
If you go with the previous approach you can service the request after you've freed enough space and continue cleaning in background.
One of the fastest ways of handling memory / disk swapping is by using memory mapped files.
Use GC.GetTotalMemory and if this exceeds your expectation then you can nullify the objects that you want to release and call GC.Collect.
Have a look at the accepted answer to this question. It uses the GetPerformanceInfo Windows API to determine memory consumption of all sorts. Task Manager is using the same information. This should help you writing a class that observes memory consumption periodically.
Once memory runs low you can fill a FIFO queue with soon-to-be deleted tasks.
The observer will delete the first object in the queue and maybe call GCCollect manually, I'm not too sure about this.
Give the collection some time before you recheck the mem consumption for your application. If there is still not enough free mem, delete the next object from the queue and so on...
I have a business app that I have written, that effectively recurses through a directory structure looking for specific Excel files, and stores their addresses. It then loops through these files and parses them by creating a DocumentParser object for each file, this is done one at a time, and not async. The software seems to be very stable, so much so that the business would like to run it to recurse through a massive directory containing upwards of 10000 relevant Excel files.
My question is, as I am creating a new DocumentParser object each time, will the GC be effective enough to discard each of the objects when they go out of scope, ie when that Excel sheet has been parsed, or is there a way I can monitor this and where necessary manually do a GC? I've never had to deal with such large amounts of data before, generally only testing it on a maximum of 40-50 Excel files at a time.
Thanks.
The GC is a very complex piece of software. And the GC is at least the only one that knows when garbage collection is necessary. So my advice is to leave the GC on it's own.
Additionally: The GC will handle these masses objects. Perhaps you will recognize a decrease of performance. If this is a problem you can try to optimize your code. But not premature.
I would leave the GC to its business. 10,000 objects is not really much work for the GC. And it's likely the cost of the GC work will be much lower than the cost of the Excel work. So it's not worth complicating your design to tweak things for the GC. If you end up with so many files to process that your application can't finish in time, it's most likely going to be the speed of the Excel processing holding you up.
However one note which may be relevant: if the DocumentParser is using unmanaged memory in its work with the Excel file, you can use GC.Add/RemoveMemoryPressure to indicate to the GC the real added cost when opening the file. If you didn't write the DocumentParser yourself, the author may already be doing this.
The issue here is that you may have a managed object that costs something in the order of 100 bytes, which allocates a large amount of unmanaged memory when it does Excel work. The GC will have no way of knowing this, so these methods help notify the GC that there is more memory pressure than it was aware of. This may change its behaviour in how/when it decides to collect, which may lead to the application maintaining a lower memory footprint. If the application's memory usage balloons out over time, then you may start seeing some slow downs from length garbage collection and possibly paging on the machine (depending on how much memory you have). You'll want to keep an eye on its memory usage to make sure it's not leaking memory as it processes - a memory profiler may be helpful there.
You don't need to manually call the GC unless you are holding some very large resource which is not the case in your situation. The GC will tweak itself with every call and if you call it manually you will just disrupt its internal profiling data.
BTW GC can collect stuff not only when it goes out of scope but also after its last usage (i.e. while it is still in scope but the variable is not used anymore).
Yes and no - The GC is effective enough to release when it needs to, but you can't generally be sure when that is.
There is a way to force a GC collection but it's generally considered to be bad practise in production code because of the effects of forcing a stack walk when it's not required is worse then using a bit of extra memory until the GC decides it needs to free resources to allocate more objects.
In C++ it is easily possible to have a permanent memory leak - just allocate memory and don't release it:
new char; //permanent memory leak guaranteed
and that memory stays allocated for the lifetime of the heap (usually the same as program runtime duration).
Is the same (a case that will lead to a specific unreferenced object never been released while memory management mechanisms are working properly) possible in a C# program?
I've carefully read this question and answers to it and it mentions some cases which lead to getting higher memory consumption than expected or IMO rather extreme cases like deadlocking the finalizer thread, but can a permanent leak be formed in a C# program with normally functioning memory management?
It depends on how you define a memory leak. In an unmanaged language, we typically think of a memory leak as a situation where memory has been allocated, and no references to it exist, so we are unable to free it.
That kind of leaks are pretty much impossible to create in .NET (unless you call out into unmanaged code, or unless there's a bug in the runtime).
However, you can get another "weaker" form of leaks: when a reference to the memory does exist (so it is still possible to find and reset the reference, allowing the GC to free the memory normally), but you thought it didn't, so you assumed the object being referenced would get GC'ed. That can easily lead to unbounded growth in memory consumption, as you're piling up references to objects that are no longer used, but which can't be garbage collected because they're still referenced somewhere in your app.
So what is typically considered a memory leak in .NET is simply a situation where you forgot that you have a reference to an object (for example because you failed to unsubscribe from an event). But the reference exists, and if you remember about it, you can clear it and the leak will go away.
You can write unmanaged code in .NET if you wish, you have enclose your block of code with unsafe keyword, so if you are writing unsafe code are you not back to the problem of managing memory by yourself and if not get a memory leak?
It's not exactly a memory leak, but if you're communicating with hardware drivers directly (i.e. not through a properly-written .net extension of a set of drivers) then it's fairly possible to put the hardware into a state where, although there may or may not be an actual memory leak in your code, you can no longer access the hardware without rebooting it or the PC...
Not sure if this is a useful answer to your question, but I felt it was worth mentioning.
GC usually delay the collection of unreachable memory to some later time when an analysis of the references show that the memory is unreachable. (In some restricted cases, the compiler may help the GC and warn it that a memory zone is unreachable when it become so.)
Depending on the GC algorithm, unreachable memory is detected as soon as a collection cycle is ran, or it may stay undetected for a certain number of collection cycles (generational GC show this behavior for instance). Some techniques even have blind spots which are never collected (use of reference counted pointer for instance) -- some deny them the name of GC algorithm, they are probably unsuitable in general purpose context.
Proving that a specific zone will be reclaimed will depend on the algorithm and on the memory allocation pattern. For simple algorithm like mark and sweep, it is easy to give a bound (says till the next collection cycle), for more complex algorithms the matter is more complex (under a scheme which use a dynamic number of generations, the conditions in which a full collection is done are not meaningful to someone not familiar with the detail of the algorithm and the precise heuristics used)
A simple answer is that classic memory leaks are impossible in GC environments, as classically a memory leak is leaked because, as an unreferenced block theres no way for the software to find it to clean it up.
On the other hand, a memory leak is any situation where the memory usage of a program has unbounded growth. This definition is useful when analyzing how software might fail when run as a service (where services are expected to run, perhaps for months at a time).
As such, any growable data structure that continues to hold onto references onto unneeded objects could cause service software to effectively fail because of address space exhaustion.
Easiest memory leak:
public static class StaticStuff
{
public static event Action SomeStaticEvent;
}
public class Listener
{
public Listener() {
StaticStuff.SomeStaticEvent+=DoSomething;
}
void DoSomething() {}
}
instances of Listener will never be collected.
If we define memory leak as a condition where a memory that can be used for creating objects, cannot be used or a memory that can be released does not then
Memory leaks can happen in:
Events in WPF where weak events need to be used. This especially can happens in Attached Properties.
Large objects
Large Object Heap Fragmentation
http://msdn.microsoft.com/en-us/magazine/cc534993.aspx
SO I already know about memory management in objective C, and I never had to know about it while programming in .net (C#). But i still have some questions about how everything is done.
-Why does the code leak in objective c if we allocate an object and not release it?
-Why doesn't this leak in C#?
-What are some advantages and disadvantages of automatic-garbage-collecting?
-Why not use autorelease on every allocated object (Objective C)?
-Is it possible to take care of the memory manually (C#)? so let's say i instantiate an object, and when I'm done i want to release it, and i don't want to wait for the garbage collector to do it?
It leaks in Objective-C because Objective-C doesn’t take any action on it. It relies on you doing all the work. It doesn’t leak in C# (more precisely, in .NET) because it employs a garbage collector which cleans up objects that are no longer used.
The main advantage of garbage collection is the above: you have far fewer memory leaks. (It’s still possible to have a memory leak, e.g. by filling a list indefinitely, but that’s harder to do accidentally.) It used to be thought that garbage collection has a disadvantage in that it could slow down the program because it keeps doing the garbage collection in the background and you have little control over it. In reality, however, the difference is negligible: there are other background tasks on your computer (e.g. device drivers) running all the time, the garbage collector doesn’t break the camel’s back either.
Auto-deallocation (as it is employed in C++ when a non-pointer variable goes out of scope) is dangerous because it opens the possibility to have a reference to it still in existence even after the object has been disposed. If your code then tries to access the object, the process goes kaboom big time.
Yes, it is possible to tell C# to release memory by invoking the garbage collector directly (GC.Collect()). However, I have yet to see a case where this is at all necessary. If you actually run out of memory, the garbage collector will already kick in automatically and free as much as it can.
Objective-C isn't a garbage-collected language, so it has no way of knowing that an object won't be used anymore unless you tell it. That's the purpose of the .NET garbage collector: it checks to see which objects can no longer be used by the program, and- at some point- gets rid of them to free up memory. There are no guarantees as to when, or if, it will ever free any given abandoned object; it's just trying to keep memory usage from going out of control.
C# can't release an object without the garbage collector. If you release an object that's still being referenced, your program is going to crash when you try to use that. Which is always the risk of manual memory management, but like all "memory-managed languages", it is trying to prevent you from making exactly that mistake. If you want to explicitly shut down an object's operation, implement the interface IDisposable for that object's type, and use the Dispose() method on that object- essentially a destructor. Be sure you're done with it, of course, and that the object will behave correctly (by throwing exceptions) if something tries to use it after it's been Dispose()d of.
Objective-C is reference-counted. When an object is out of references, it deletes itself. It's not a bad solution to the "is someone still using this object?" problem, except for data structures that refer to themselves; circular data structures will hang around forever unless carefully handled. .NET isn't a reference counter, so it will get rid of circular data structures that can't be reached from running code.
Autorelease is just a "release later", for returning a value that should self-destruct if the code that grabs it doesn't immediately want to hold onto it, as far as I understand. (I'm not an Objective-C programmer, though.) It gets around the "who releases this object?" problem for calls that return an object, without destroying it before the function is finished. It's a special use case, though, and it doesn't make sense in most other cases.
The advantage of automatic garbage collection is that you do not have to explicitly free/release your objects as you said. The disadvantage is you cannot be sure when (or even if) any given object instance will be released.
C# has other mechanisms to release other resources like files, or db connections that have to be released detirminitely. For example, using allows you to make sure that IDispose is called on an object for sure.
Garbage collected systems tend to have more memory in use at any given time than a well tuned manual implementation. Of course, you do not have memory leaks.
The quality and performance of garbage collectors can vary quite a bit which is something you may not have a lot of control over. For example, there may be a noticable lag when the GC runs. In .NET, you can call GC.Collect() to tell the GC that now would be a good time but it may or may not listen.
These days Objective-C is also garbage collected on the Mac. On the iPhone it is reference counted though so I assume that is where you are running into the issue.
On garbage collected systems, you can still run into the issue where a an object hangs onto a reference to an object that you would expect to be garbage collected. The garbage collector will not clean-up this up and so the memory is essentially leaked.
In reference counted systems, you do not have to keep track of every object that points at your object instances but you do have to specify that you want them released.
EDIT: I guess I did not say this explicitly - you cannot manually control memory allocation in C#.