Our system keeps hold of lots of large objects for performance. However, when running low on memory, we want to drop some of the objects. The objects are prioritized, so I know which ones to drop. Is there a simple way of determining when to free memory? Also, dropping 1 object may not be enough, so I guess I need a loop to drop, check, drop again if necessary, etc. But in c#, I won't necessarily see the effect immediately of dropping an object, so how do I avoid kicking too much stuff out?
I guess it's just a simple function of used vs total physical & virtual memory. But what function?
Edit: Some clarifications
"Large objects" was misleading. I meant logical "package" of objects (the objects should be small enough individually to avoid the LOB - that's the intention certainly) that together are large (~ 100MB?)
A request can come in which requires the use of one such package. If it is in memory, the response is rapid. If not, it needs to be reconstructed, which is very slow. So I want to keep stuff in memory as long as possible, but can ditch the least requested ones when necessary.
We have no sensible way to serialize these packages. We should probably do that, but it's a lot of work and there's a lot of resistance to doing so.
Our original simple approach is to periodically compare the following to a configurable threshold.
var c = new ComputerInfo();
return c.AvailablePhysicalMemory / c.TotalPhysicalMemory;
There're a lot of different topics on this questions and I think is best to clarify them before actually answering.
First of, you say your app does get a hold of a lot of "large objects". Define large object. Anything larger than about 85K goes into the LOH which only gets collected as part of a generation 2 collection (the most expensive of them all), anything smaller than that, even if you think is a "big" object, is not and it's treated as any other kind of object.
Secondly there're two problems in terms of "managing memory"
One is managing the amount of space you're using inside your virtual memory space. That is, in 32 bit systems making sure you can address all the memory you're asking for, which in Windows 32 bit uses to be around 1,5 GB.
Secondly is managing disposing of that memory when it's needed, which is a part of the garbage collector work so that it triggers when there's a shortage on memory (although that doesn't mean you can't get an OutOfMemoryException if you don't give the GC time enough to do its job).
With that said, I think you should forget about taking the place of the GC... just let it do its job and, if you're worried then find the critical paths that may fail (on memory request) and protect yourself against OutOfMemoryExceptions.
There're a lot of different patterns for handling the case you're posting and most of them really depend on your business scenario. One example is having a state machine that can actually go to an "OutOfMemory" state, in which case the system switches to freeing memory before doing anything else (that includes disposing old objects and invoking the GC to clean everything up, all while you patiently wait for it to happen).
Other techniques involve saving the data to the disk and then manually swapping in and out objects based on some algorithm when you reach certain levels. That means stopping all your threads (or some, depending on business) and moving the data back and forth.
If your large objects are all controlled in terms of location you can also declare a facade over their creation, so that the facade can check whether it needs to free objects or not based on the amount of memory (virtual memory) your process is using. BTW, use the PerformanceInfo API call as quoted in the other answer as this will include the amount of memory used by unmanaged code, which is, nonetheless, located inside the virtual memory space of your process.
Don't worry too much about "real" memory, as the operating system will make sure the most appropriate pages are located in memory.
Then there're hundreds of other optimizations that completely depend on your business scenario. For example databases "know" to bring data to memory depending on the query and predicting the data you're going to use in advance so the data is ready and they do remove objects that are not used... but that's another topic.
Edit: Based on your edits to the question.
Checking memory in the facade will not add a significant overhead in terms of performance.
If you start getting low on memory you should take a decision of how many objects / how much space are you going to free. Don't do it one at a time, take a bunch of them and free enough memory so that you don't have to collect again.
If you go with the previous approach you can service the request after you've freed enough space and continue cleaning in background.
One of the fastest ways of handling memory / disk swapping is by using memory mapped files.
Use GC.GetTotalMemory and if this exceeds your expectation then you can nullify the objects that you want to release and call GC.Collect.
Have a look at the accepted answer to this question. It uses the GetPerformanceInfo Windows API to determine memory consumption of all sorts. Task Manager is using the same information. This should help you writing a class that observes memory consumption periodically.
Once memory runs low you can fill a FIFO queue with soon-to-be deleted tasks.
The observer will delete the first object in the queue and maybe call GCCollect manually, I'm not too sure about this.
Give the collection some time before you recheck the mem consumption for your application. If there is still not enough free mem, delete the next object from the queue and so on...
Related
I need an application that will run smoothly. I have many serial chunks of computations I need to consecutively perform in short periods of time each, so I don't mind the GC doing it's job and I even can take more frequent collections but what I need to minimize the length of each GC collection.
I would like (if possible) to have 1 milli max pause of thread activity due to the GC each time.
what is the best way to acheive this in .NET (I know that .NET it not the technology for such demands but if it will meet my demands when optimized the save of development hours and flexibility for future specs is good incentive to try it out)?
Right from the MSDN page:
https://msdn.microsoft.com/en-us/library/ms973837.aspx
The .NET garbage collector provides a high-speed allocation service
with good use of memory and no long-term fragmentation problems,
however it is possible to do things that will give you much less than
optimal performance. To get the best out of the allocator you should
consider practices such as the following:
Allocate all of the memory (or as much as possible) to be used with a given data structure at the same time. Remove temporary allocations
that can be avoided with little penalty in complexity.
Minimize the number of times object pointers get written, especially those writes made to older objects.
Reduce the density of pointers in your data structures.
Make limited use of finalizers, and then only on "leaf" objects, as much as possible. Break objects if necessary to help with this.
A regular practice of reviewing your key data structures and conducting memory usage profiles with tools like Allocation Profiler
will go a long way to keeping your memory usage effective and having
the garbage collector working its best for you.
As Ron mentioned in his comment. You have to be extra smart with .NET if you want a lot of control over the GC.
Is it better to pre-allocate (for example) 100KB of memory (in the heap) but then only go on to use 60KB, or is it better to allocate each byte as you need it?
My question arises from reading this blog:
http://deplinenoise.wordpress.com/2012/10/20/toollibrary-memory-management-youre-doing-it-wrong/
This really depends on intricate memory details of your application. However, the guy's fundamental point is absolutely accurate- pre-allocation and memory regions are obscenely efficient. new and delete are the most general tools possible, and if you have a more specific problem, you can find a much more efficient solution. Fixed-size object pools are another example.
It is. The operating system does not actually give you all that space anyway in some cases. Take Linux for example. Java tends to request large amounts of memory and never use it so what actually happens is that the OS keeps tracks of these ranges you requested but never maps them into the page tables (and therefore never allocates a frame for it) until you use it. So in terms of virtual memory it looks like you're using a lot but really you're only using the pages that you ever access (the 40kb in your example that you actually used). You can see this in the difference between virtual and physical usage of memory (assuming your processes aren't swapping out).
I have a business app that I have written, that effectively recurses through a directory structure looking for specific Excel files, and stores their addresses. It then loops through these files and parses them by creating a DocumentParser object for each file, this is done one at a time, and not async. The software seems to be very stable, so much so that the business would like to run it to recurse through a massive directory containing upwards of 10000 relevant Excel files.
My question is, as I am creating a new DocumentParser object each time, will the GC be effective enough to discard each of the objects when they go out of scope, ie when that Excel sheet has been parsed, or is there a way I can monitor this and where necessary manually do a GC? I've never had to deal with such large amounts of data before, generally only testing it on a maximum of 40-50 Excel files at a time.
Thanks.
The GC is a very complex piece of software. And the GC is at least the only one that knows when garbage collection is necessary. So my advice is to leave the GC on it's own.
Additionally: The GC will handle these masses objects. Perhaps you will recognize a decrease of performance. If this is a problem you can try to optimize your code. But not premature.
I would leave the GC to its business. 10,000 objects is not really much work for the GC. And it's likely the cost of the GC work will be much lower than the cost of the Excel work. So it's not worth complicating your design to tweak things for the GC. If you end up with so many files to process that your application can't finish in time, it's most likely going to be the speed of the Excel processing holding you up.
However one note which may be relevant: if the DocumentParser is using unmanaged memory in its work with the Excel file, you can use GC.Add/RemoveMemoryPressure to indicate to the GC the real added cost when opening the file. If you didn't write the DocumentParser yourself, the author may already be doing this.
The issue here is that you may have a managed object that costs something in the order of 100 bytes, which allocates a large amount of unmanaged memory when it does Excel work. The GC will have no way of knowing this, so these methods help notify the GC that there is more memory pressure than it was aware of. This may change its behaviour in how/when it decides to collect, which may lead to the application maintaining a lower memory footprint. If the application's memory usage balloons out over time, then you may start seeing some slow downs from length garbage collection and possibly paging on the machine (depending on how much memory you have). You'll want to keep an eye on its memory usage to make sure it's not leaking memory as it processes - a memory profiler may be helpful there.
You don't need to manually call the GC unless you are holding some very large resource which is not the case in your situation. The GC will tweak itself with every call and if you call it manually you will just disrupt its internal profiling data.
BTW GC can collect stuff not only when it goes out of scope but also after its last usage (i.e. while it is still in scope but the variable is not used anymore).
Yes and no - The GC is effective enough to release when it needs to, but you can't generally be sure when that is.
There is a way to force a GC collection but it's generally considered to be bad practise in production code because of the effects of forcing a stack walk when it's not required is worse then using a bit of extra memory until the GC decides it needs to free resources to allocate more objects.
for creating my own memory management in C# I need to have a possibility to intercept the new command before it returns a null or fires an exception. When using the new command I want to call the original handler first. If this handler fails to return a block of memory, I want to inform all my mappable objects to be written to disk and to free memory.
In C++ there has been a possibility to intercept the new command by assigned a different new handler. In C# I couldn't find anything which shows the same behaviour.
Has anyone seen a possibility to do this.
Thanks
Martin
You can't do what you're after in C#, or in any managed language. Nor should you try. The .NET runtime manages allocations and garbage collection. It's impossible for you to instruct your objects to free memory, as you have no guarantee when (or, technically, even if) a particular object will be collected once it's no longer rooted. Even eliminating all references and manually calling GC.Invoke() is not an absolute guarantee. If you're looking for granular memory management, you need to be using a lower-level environment.
As an important point, it is not possible for the new operator to return a null reference. It can only return either a reference to the specified type or throw an exception.
If you want to do your own management of how and when objects are allocated, you'll have to use something along the lines of a factory pattern.
I think you're approaching this from the wrong angle; the whole point of using a runtime with managed memory is so that you don't have to worry about memory. The tradeoff is that you can't do this type of low-level trickery.
As an aside, you can 'override new' for a limited class of objects (those descending from ContextBoundObject) by creating a custom ProxyAttribute, though this likely does not address what you're intending.
I believe that you are not understanding the side-effects of what you're asking for. Even in C++, you can't really do what you think you can do. The reason is simple, if you have run out of memory, you can't even make your objects serialize to disk because you have no memory to accomplish that. By the time memory is exhausted, the only real thing you can do is either discard memory (without saving or doing anything else first) or abend the program.
Now, what you're talking about will still work 95% of the time because your memory allocation will likely be sufficiently large that when it fails, you have a little room to play with, but you can't guarantee that this will be the case.
Example: If you have only 2MB of memory left, and you try to allocate 10MB, then it will fail, and you still have 2MB to play with to try and free up some memory, which will allow you to allocate small chunks of memory needed to serialize objects to disk.
But, if you only have 10 bytes of memory left, then you don't even have enough memory to create a new exception object (unless it comes from a reserved pool). So, in essence, you're creating a very poor situation that will likely crash at some point.
Even in C++ low memory conditions are almost impossible to get right, and it's almost impossible to recover from every case unless you have very carefully planned, and pre-allocated memory for your recovery routines.
Now, when you're talking about a garbage collected OS, you have no control over how memory is allocated or freed. At best, all you can do is give hints. There is very little you can reliably do here by the nature of garbage collection. It's non-deterministic.
I have a fewSortedList<>andSortedDictionary<>structures in my simulation code and I add millions of items in them over time. The problem is that the garbage collector does not deallocate quickly enough memory so there is a huge hit on the application's performance. My last option was to engage theGC.Collect()method so that I can reclaim that memory back. Has anyone got a different idea? I am aware of theFlyweightpattern which is another option but I would appreciate other suggestions that would not require huge refactoring of my code.
You are fighting the "There's no free lunch" principle. You cannot assume that stuffing millions of items in a list isn't going to affect perf. Only the SortedList<> should be a problem, it is going to start allocating memory in the Large Object Heap. That allocation isn't going to be freed soon, it takes a gen #2 collection to chuck stuff out of the LOH again. This delay should not otherwise affect the perf of your program.
One thing you can do is avoiding the multiple of copies of the internal array that SortedList<> will jam into the LOH when it keeps growing. Try to guess a good value for Capacity so it pre-allocates the large array up front.
Next, use Perfmon.exe or TaskMgr.exe and looks at the page fault delta of your program. It should be quite busy while you're allocating. If you see low values (100 or less) then you might have a problem with the paging file being fragmented. A common scourge on older machines that run XP. Defragging the disk and using SysInternals' PageDefrag utility can do extraordinary wonders.
I think the SortedList uses a array as backing field, which means that large SortedList get allocated on the Large object heap. The large object heap can get defragmentated, which can cause an out of memory exception while in principle there is still enough memory available.
See this link.
This might be your problem, as intermediate calls to GC.collect prevent the LOH from getting badly defragmented in some scenarios, which explains why calling it helps you reduce the problem.
The problem can be mitigated by splitting large objects into smaller fragments.
I'd start with doing some memory profiling on your application to make sure that the items you remove from those lists (which I assume is happening from the way your post is written) are actually properly released and not hanging around places.
What sort of performance hit are we talking and on what operating system? If I recall, GC will run when it's needed, not immediately or even "soon". So task manager showing high memory allocated to your application is not necessarily a problem. What happens if you put the machine under higher load (e.g. run several copies of your application)? Does memory get reclaimed faster in that scenario or are you starting to run out of memory?
I hope answers to these questions will help point you in a right direction.
Well, if you keep all of the items in those structures, the GC will never collect the resources because they still have references to them.
If you need the items in the structures to be collected, you must remove them from the data structure.
To clear the entire data structure try using Clear() and setting the data structure reference to null. If the data is still not getting collected fast enough, call CC.Collect().