How to create a large(10gb+) persistent cache in .NET

How to create a large(10gb+) persistent cache in .NET - c#

Spin-off from my other question.
.NET Garbagecollector trouble. Blocks for 15-40 mins
I want to create a simple persistent cache. For simplicity, everything that goes in stays in. I have currently implemented this with a ImmutableDictionary<int,DataItem> but I have problem with the garbage collector.
It seems to think I use lots of data which is true it as it contains 10 000-100 000 complex objects, and then it begins to think its a good idea to scan the cache very often. With blocking Generation 2. At least that's what I believe it does. "%Time in gc" is 90%+ and my application is blazing slow.
Can I somehow mark the cache as untouchable or let my app use more memory before GC thinks it should do a full collect? I have loads of free memory on the server.
Might switching to NCache or Redis for Windows (MSOpenTech) be a better solution?

You could use GC.TryStartNoGCRegion to strictly limit the times at which GC occurs. Though in line with my previous comment, I believe this is not a good approach.
Have you considered examining how the cached objects are cleaning themselves up? Perhaps when objects exceed the 4 hour ttl, cleaning them up takes too long?
Are the objects disposable? Are they finalizable? If yes to either one, whats the time cost of a Dispose or Finalize operation on an individual object? If they are Finalizable, do they need to be?

Related

Simple algorithm to determine when to free some memory .Net

Our system keeps hold of lots of large objects for performance. However, when running low on memory, we want to drop some of the objects. The objects are prioritized, so I know which ones to drop. Is there a simple way of determining when to free memory? Also, dropping 1 object may not be enough, so I guess I need a loop to drop, check, drop again if necessary, etc. But in c#, I won't necessarily see the effect immediately of dropping an object, so how do I avoid kicking too much stuff out?
I guess it's just a simple function of used vs total physical & virtual memory. But what function?
Edit: Some clarifications
"Large objects" was misleading. I meant logical "package" of objects (the objects should be small enough individually to avoid the LOB - that's the intention certainly) that together are large (~ 100MB?)
A request can come in which requires the use of one such package. If it is in memory, the response is rapid. If not, it needs to be reconstructed, which is very slow. So I want to keep stuff in memory as long as possible, but can ditch the least requested ones when necessary.
We have no sensible way to serialize these packages. We should probably do that, but it's a lot of work and there's a lot of resistance to doing so.
Our original simple approach is to periodically compare the following to a configurable threshold.
var c = new ComputerInfo();
return c.AvailablePhysicalMemory / c.TotalPhysicalMemory;

There're a lot of different topics on this questions and I think is best to clarify them before actually answering.
First of, you say your app does get a hold of a lot of "large objects". Define large object. Anything larger than about 85K goes into the LOH which only gets collected as part of a generation 2 collection (the most expensive of them all), anything smaller than that, even if you think is a "big" object, is not and it's treated as any other kind of object.
Secondly there're two problems in terms of "managing memory"
One is managing the amount of space you're using inside your virtual memory space. That is, in 32 bit systems making sure you can address all the memory you're asking for, which in Windows 32 bit uses to be around 1,5 GB.
Secondly is managing disposing of that memory when it's needed, which is a part of the garbage collector work so that it triggers when there's a shortage on memory (although that doesn't mean you can't get an OutOfMemoryException if you don't give the GC time enough to do its job).
With that said, I think you should forget about taking the place of the GC... just let it do its job and, if you're worried then find the critical paths that may fail (on memory request) and protect yourself against OutOfMemoryExceptions.
There're a lot of different patterns for handling the case you're posting and most of them really depend on your business scenario. One example is having a state machine that can actually go to an "OutOfMemory" state, in which case the system switches to freeing memory before doing anything else (that includes disposing old objects and invoking the GC to clean everything up, all while you patiently wait for it to happen).
Other techniques involve saving the data to the disk and then manually swapping in and out objects based on some algorithm when you reach certain levels. That means stopping all your threads (or some, depending on business) and moving the data back and forth.
If your large objects are all controlled in terms of location you can also declare a facade over their creation, so that the facade can check whether it needs to free objects or not based on the amount of memory (virtual memory) your process is using. BTW, use the PerformanceInfo API call as quoted in the other answer as this will include the amount of memory used by unmanaged code, which is, nonetheless, located inside the virtual memory space of your process.
Don't worry too much about "real" memory, as the operating system will make sure the most appropriate pages are located in memory.
Then there're hundreds of other optimizations that completely depend on your business scenario. For example databases "know" to bring data to memory depending on the query and predicting the data you're going to use in advance so the data is ready and they do remove objects that are not used... but that's another topic.
Edit: Based on your edits to the question.
Checking memory in the facade will not add a significant overhead in terms of performance.
If you start getting low on memory you should take a decision of how many objects / how much space are you going to free. Don't do it one at a time, take a bunch of them and free enough memory so that you don't have to collect again.
If you go with the previous approach you can service the request after you've freed enough space and continue cleaning in background.
One of the fastest ways of handling memory / disk swapping is by using memory mapped files.

Use GC.GetTotalMemory and if this exceeds your expectation then you can nullify the objects that you want to release and call GC.Collect.

Have a look at the accepted answer to this question. It uses the GetPerformanceInfo Windows API to determine memory consumption of all sorts. Task Manager is using the same information. This should help you writing a class that observes memory consumption periodically.
Once memory runs low you can fill a FIFO queue with soon-to-be deleted tasks.
The observer will delete the first object in the queue and maybe call GCCollect manually, I'm not too sure about this.
Give the collection some time before you recheck the mem consumption for your application. If there is still not enough free mem, delete the next object from the queue and so on...

C# Garbage Collector behavior

We have an application in C# that controls one of our device and reacts to signal this device gives us.
Basically, the application creates threads, treat operations (access to a database, etc) and communicates with this device.
In the life of the application, it creates objects and release them and so far, we're letting the Garbage Collector taking care of our memory. i've read that it is highly recommanded to let the GC do its stuff without interfering.
Now the problem we're facing is that the process of our application grows ad vitam eternam, growing by step. Example:
It seems to have "waves" when the application is growing and all of a sudden, the application release some memory but seems to leave memory leaks at the same time.
We're trying to investigate the application with some memory profiler but we would like to understand deeply how the garbage Collector works.
I've found an excellent article here : The Danger of Large Objects
I've also found the official documentation here : MSDN
Do you guys know another really really deep documentation of GC?
Edit :
Here is a screenshot that illustrates the behavior of the application :
You can clearly see the "wave" effect we're having on a very regular pattern.
Subsidiary question :
I've seen that my GC Collection 2 Heap is quite big and following the same pattern as the total bytes used by my application. I guess it's perfectly normal because most of our objects will survive at least 2 garbage collections (for example Singleton classes, etc)... What do you think ?

The behavior you describe is typical of problems with objects created on Large Object Heap (LOH). However, your memory consumption seems to return to some lower value later on, so check twice whether it is really a LOH issue.
You are obviously aware of that, but what is not quite obvious is that there is an exception to the size of the objects on LOH.
As described in documentation, objects above 85000 bytes in size end up on LOH. However, for some reason (an 'optimization' probably) arrays of doubles which are longer than 1000 elements also end up there:
double[999] smallArray = ...; // ends up in 'normal', Gen-0 heap
double[1001] bigArray = ...; // ends up in LOH
These arrays can result in fragmented LOH, which requires more memory, until you get an Out of memory exception.
I was bitten by this as we had an app which received some sensor readings as arrays of doubles which resulted in LOH defragmentation since every array slightly differed in length (these were readings of realtime data at various frequencies, sampled by non-realtime process). We solved the issue by implementing our own buffer pool.

I did some research on a class I was teaching a couple of years back. I don't think the references contain any information regarding the LoH but I thought it was worthwhile to share them either way (see below). Further, I suggest performing a second search for unreleased object references before blaming the garbage collector. Simply implementing a counter in the class finalizer to check that these large objects are being dropped as you believe.
A different solution to this problem, is simply to never deallocate your large objects, but instead reuse them with a pooling strategy. In my hubris I have many times before ended up blaming the GC prematurely for the memory requirements of my application growing over time, however this is more often than not a symptom of faulty implementation.
GC References:
http://blogs.msdn.com/b/clyon/archive/2007/03/12/new-in-orcas-part-3-gc-latency-modes.aspx
http://msdn.microsoft.com/en-us/library/ee851764.aspx
http://blogs.msdn.com/b/ericlippert/archive/2010/09/30/the-truth-about-value-types.aspx
http://blogs.msdn.com/b/ericlippert/archive/2009/04/27/the-stack-is-an-implementation-detail.aspx
Eric Lippert's blog is especially interesting, when it comes to understanding anything C# in detail!

Here is an update with some of my investigations :
In our application, we're using a lot of thread to make different tasks. Some of these threads have higher priority.
1) We're using a GC that is concurrent and we tried to switch it back to non-concurrent.
We've seen a dramatic improvment :
The Garbage collector is being called much often and it seems that, when called more often, it's releasing much better our memory.
I'll post a screenshot as soon as I have a good one to illustrate this.
We've found a really good article on the MSDN. We also found an interesting question on SO.
With the next Framework 4.5, 4 possibilities will be available for GC configuration.
Workstation - non-concurrent
Workstation - concurrent
Server - non-concurrent
Server - concurrent
We'll try and switch to the "server - non-concurrent" and "serveur - concurrent" to check if it's giving us better performance.
I'll keep this thread updated with our findings.

Undesirable Garbage Collection

In a title "Forcing a Garbage Colection" from book "C# 2010 and the .NET 4 Platform" by Andrew Troelsen written:
"Again, the whole purpose of the .NET garbage collector is to manage memory on our behalf. However, in some very rare circumstances, it may be beneficial to programmatically force a garbage collection using GC.Collect(). Specifically:
• Your application is about to enter into a block of code that you don’t want interrupted by a possible garbage collection.
...
"
But stop! Is there a such case when Garbage Collection is undesirable? I never saw/read something like that (because of my little development experience of course). If while your practice you have done something like that, please share. For me it's very interesting point.
Thank you!

Yes, there's absolutely a case when garbage collection is undesirable: when a user is waiting for something to happen, and they have to wait longer because the code can't proceed until garbage collection has completed.
That's Troelsen's point: if you have a specific point where you know a GC isn't problematic and is likely to be able to collect significant amounts of garbage then it may be a good idea to provoke it then, to avoid it triggering at a less opportune moment.

I run a recipe related website, and I store a massive graph of recipes and their ingredient usage in memory. Due to the way I pivot this information for quick access, I have to load several gigs of data into memory when the application loads before I can organize the data into a very optimized graph. I create a huge amount of tiny objects on the heap that, once the graph is built, become unreachable.
This is all done when the web application loads, and probably takes 4-5 seconds to do. After I do so, I call GC.Collect(); because I'd rather re-claim all that memory now rather than potentially block all threads during an incoming HTTP request while the garbage collector is freaking out cleaning up all these short lived objects. I also figure it's better to clean up now since the heap is probably less fragmented at this time, since my app hasn't really done anything else so far. Delaying this might result in many more objects being created, and the heap needing to be compressed more when GC runs automatically.
Other than that, in my 12 years of .NET programming, I've never come across a situation where I wanted to force the garbage collector to run.

The recommendation is that you should not explicitly call Collect in your code. Can you find circumstances where it's useful?
Others have detailed some, and there are no doubt more. The first thing to understand though, is don't do it. It's a last resort, investigate other options, learn how GC works look at how your code is impacted, follow best practices for your designs.
Calling Collect at the wrong point will make your performance worse. Worse still, to rely on it makes your code very fragile. The rare conditions required to make a call to Collect beneficial, or at last not harmful, can be utterly undone with a simple change to the code, which will result unexpected OOMs, sluggish performamnce and such.

I call it before performance measurements so that the GC doesn't falsify the results.
Another situation are unit-tests testing for memory leaks:
object doesItLeak = /*...*/; //The object you want to have tested
WeakReference reference = new WeakRefrence(doesItLeak);
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
Assert.That(!reference.IsAlive);
Besides those, I did not encounter a situation in which it would actually be helpful.
Especially in production code, GC.Collect should never be found IMHO.

It would be very rare, but GC can be a moderately expensive process so if there's a particular section that's timing sensitive, you don't want that section interupted by GC.

Your application is about to enter into a block of code that you don’t
want interrupted by a possible garbage collection. ...
A very suspect argument (that is nevertheless used a lot).
Windows is not a Real Time OS. Your code (Thread/Process) can always be pre-empted by the OS scheduler. You do not have a guaranteed access to the CPU.
So it boils down to: how does the time for a GC-run compare to a time-slot (~ 20 ms) ?
There is very little hard data available about that, I searched a few times.
From my own observation (very informal), a gen-0 collection is < 40 ms, usually a lot less. A full gen-2 can run into ~100 ms, probably more.
So the 'risk' of being interrupted by the GC is of the same order of magnitude as being swapped out for another process. And you can't control the latter.

C# Production Server, Do I collect the garbage?

I know there's tons of threads about this. And I read a few of them.
I'm wondering if in my case it is correct to GC.Collect();
I have a server for a MMORPG, in production it is online day and night. And the server is restarted every other day to implement changes to the production codebase. Every twenty minutes the server pauses all other threads, and serializes the current game state. This usually takes 0.5 to 4 seconds
Would it be a good idea to GC.Collect(); after serialization?
The server is, obviously, constantly creating and destroying game items.
Would I have a notorious gain in performance or memory optimization / usage?
Should I not manually collect?
I've read about how collecting can be bad if used in the wrong moments or too frequently, but I'm thinking these saves are both a good moment to collect, and not that frequent.
The server is in framework 4.0
Update in answer to a comment:
We are randomly experiencing server freezes, sometimes, unexpectedly, the server memory usage will raise increasingly until it reaches a point when the server takes way too long to handle any network operation. Thus, I'm considering a lot of different approaches to solve the issue, this is one of them.

The garbage collector knows best when to run, and you shouldn't force it.
It will not improve performance or memory optimization. CLR can tell GC to collect object which are no longer used if there is a need to do that.
Answer to an updated part:
Forcing the collection is not a good solution to the problem. You should rather have a look a bit deeper into your code to find out what is wrong. If memory usage grows unexpectedly you might have an issue with unmanaged resources which are not properly handled or even a "leaky code" within managed code.
One more thing. I would be surprise if calling GC.Collect fixed the problem.

Every twenty minutes the server pauses
all other threads, and serializes the
current game state. This usually takes
0.5 to 4 seconds
If all your threads are suspended already anyway you might as well call the garbage collection, since it should be fairly fast at this point. I suspect doing this will only mask your real problem though, not actually solve it.
We are randomly experiencing server
freezes, sometimes, unexpectedly, the
server memory usage will raise
increasingly until it reaches a point
when the server takes way too long to
handle any network operation. Thus,
I'm considering a lot of different
approaches to solve the issue, this is
one of them.
This sounds more like you actually are still referencing all these objects that use the memory - if you weren't the GC would run due to the memory pressure and try to release those objects. You might be looking at an actual bug in your production code (i.e. objects that are still subscribed to events or otherwise are being referenced when they shouldn't be) rather than something you can fix by manually taking out the garbage.
If possible in this scenario you should run a performance analysis to see where your bottlenecks are and what part of your code is causing the brunt of the memory allocations.

Could the memory increase be an "attack" by a player with a fake/modified game-client? Is a lot of memory allocated by the server when it accepts a new client connection? Does the server handle bogus incoming data well?

How to avoid garbage collection in real time .NET application?

I'm writting a financial C# application which receive messages from the network, translate them into different object according to the message type and finaly apply the application business logic on them.
The point is that after the business logic is applied, I'm very sure I will never need this instance again. Rather than to wait for the garbage collector to free them, I'd like to explicitly "delete" them.
Is there a better way to do so in C#, should I use a pool of object to reuse always the same set of instance or is there a better strategy.
The goal being to avoid the garbage collection to use any CPU during a time critical process.

Don't delete them right away. Calling the garbage collector for each object is a bad idea. Normally you really don't want to mess with the garbage collector at all, and even time critical processes are just race conditions waiting to happen if they're that sensitive.
But if you know you'll have busy vs light load periods for your app, you might try a more general GC.Collect() when you reach a light period to encourage cleanup before the next busy period.

Look here: http://msdn.microsoft.com/en-us/library/bb384202.aspx
You can tell the garbage collector that you're doing something critical at the moment, and it will try to be nice to you.

You hit in yourself -- use a pool of objects and reuse those objects. The semantics of the calls to those object would need to be hidden behind a factory facade. You'll need to grow the pool in some pre-defined way. Perhaps double the size everytime it hits the limit -- a high water algorithm, or a fixed percentage. I'd really strongly advise you not to call GC.Collect().
When the load on your pool gets low enough you could shrink the pool and that will eventually trigger a garbage collection -- let the CLR worry about it.

Attempting to second-guess the garbage collector is generally a very bad idea. On Windows, the garbage collector is a generational one and can be relied upon to be pretty efficient. There are some noted exceptions to this general rule - the most common being the occurrence of a one-time event that you know for a fact will have caused a lot of old objects to die - once objects are promoted to Gen2 (the longest lived) they tend to hang around.
In the case you mention, you sound as though you are generating a number of short-lived objects - these will result in Gen0 collections. These happen relatively often anyway, and are the most efficient. You could avoid them by having a reusable pool of objects, if you prefer, but it is best to ascertain for certain if GC is a performance problem before taking such action - the CLR profiler is the tool for doing this.
It should be noted that the garbage collector is different on different .NET frameworks - on the compact framework (which runs on the Xbox 360 and on mobile platforms) it is a non-generational GC and as such you must be much more careful about what garbage your program generates.

Forcing a GC.Collect() is generally a bad idea, leave the GC to do what it does best. It sounds like the best solution would be to use a pool of objects that you can grow if necessary - I've used this pattern successfully.
This way you avoid not only the garbage collection but the regular allocation cost as well.
Finally, are you sure that the GC is causing you a problem? You should probably measure and prove this before implementing any perf-saving solutions - you may be causing yourself unnecessary work!

"The goal being to avoid the garbage collection to use any CPU during
a time critical process"
Q: If by time critical, you mean you're listening to some esoteric piece of hardware, and you can't afford to miss the interrupt?
A: If so then C# isn't the language to use, you want Assembler, C or C++ for that.
Q: If by time Critical, you mean while there are lots of messages in the pipe, and you don't want to let the Garbage collector slow things down?
A: If so you are worrying needlessly. By the sounds of things your objects are very short lived, this means the garbage collector will recycle them very efficiently, without any apparent lag in performance.
However, the only way to know for sure is test it, set it up to run overnight processing a constant stream of test messages, I'll be stunned if you your performance stats can spot when the GC kicks in (and even if you can spot it, I'll be even more surprised if it actually matters).

Get a good understanding and feel on how the Garbage Collector behaves, and you will understand why what you are thinking of here is not recommended. unless you really like the CLR to spend time rearranging objects in memory alot.
http://msdn.microsoft.com/en-us/magazine/bb985010.aspx
http://msdn.microsoft.com/en-us/magazine/bb985011.aspx

How intensive is the app? I wrote an app that captures 3 sound cards (Managed DirectX, 44.1KHz, Stereo, 16-bit), in 8KB blocks, and sends 2 of the 3 streams to another computer via TCP/IP. The UI renders an audio level meter and (smooth) scrolling title/artist for each of the 3 channels. This runs on PCs with XP, 1.8GHz, 512MB, etc. The App uses about 5% of the CPU.
I stayed clear of manually calling GC methods. But I did have to tune a few things that were wasteful. I used RedGate's Ant profiler to hone in on the wasteful portions. An awesome tool!
I wanted to use a pool of pre-allocated byte arrays, but the managed DX Assembly allocates byte buffers internally, then returns that to the App. It turned out that I didn't have to.

If it is absolutely time critical then you should use a deterministic platform like C/C++. Even calling GC.Collect() will generate CPU cycles.
Your question starts off with the suggestion that you want to save memory but getting rid of objects. This is a space critical optimization. You need to decide what you really want because the GC is better at optimizing this situation than a human.

From the sound of it, it seems like you're talking about deterministic finalization (destructors in C++), which doesn't exist in C#. The closest thing that you will find in C# is the Disposable pattern. Basically you implement the IDisposable interface.
The basic pattern is this:
public class MyClass: IDisposable
{
private bool _disposed;
public void Dispose()
{
Dispose( true );
GC.SuppressFinalize( this );
}
protected virtual void Dispose( bool disposing )
{
if( _disposed )
return;
if( disposing )
{
// Dispose managed resources here
}
_disposed = true;
}
}

You could have a limited amount of instances of each type in a pool, and reuse the already done with instances. The size of the pool would depend on the amount of messages you'll be processing.

Instead of creating a new instance of an object every time you get a message, why don't you reuse objects that have already been used? This way you won't be fighting against the garbage collector and your heap memory won't be getting fragmented.**
For each message type, you can create a pool to hold the instances that are not in use. Whenever you receive a network message, you look at the message type, pull a waiting instance out of the appropriate pool and apply your business logic. After that, you put that instance of the message object back into it's pool.
You will most likely want to "lazy load" your pool with instances so your code scales easily. Therefore, your pool class will need to detect when a null instance has been pulled and fill it up before handing it out. Then when the calling code puts it back in the pool it's a real instance.
** "Object pooling is a pattern to use that allows objects to be reused rather than allocated and deallocated, which helps to prevent heap fragmentation as well as costly GC compactions."
http://geekswithblogs.net/robp/archive/2008/08/07/speedy-c-part-2-optimizing-memory-allocations---pooling-and.aspx

In theory the GC shouldn't run if your CPU is under heavy load or unless it really needs to. But if you have to, you may want to just keep all of your objects in memory, perhaps a singleton instance, and never clean them up unless you're ready. That's probably the only way to guarantee when the GC runs.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.