How to purge expired items from cache?

How to purge expired items from cache? - c#

I've got a nice little class built that acts as a cache. Each item has an expiration TimeSpan or DateTime. Each time an attempt to access an item in the cache is made, the item's expiration is checked, and if it's expired, the item is removed from the cache and nothing is returned.
That's great for objects that are accessed frequently, but if an item is put in the cache and never accessed again, it's never removed, even though it's expired.
What's a good methodology for expiring such items from the cache?
Should I have a background thread infinitely enumerating every item in the cache to check if it's expired?

The best code is no code. Use the ASP.NET cache instead. You can reference it as System.Web.HttpRuntime.Cache in any application, not just web applications.

In my experience, maintaining a custom caching mechanism became more trouble than it was worth. There are several libraries out there that have already solved these problems. I would suggest using one of them. A popular one in .Net is the Enterprise Library, although I have limited experience with its caching abilities.
If you must use a custom caching mechanism, then I see no problem with a watchful thread idea you suggested. That is, if your application is a server-based application and not a web app. If it's a web app, you already have built in sliding expiration. You can then just wrap it in a strongly typed wrapper to avoid referencing cache items by key each time.

You can implement an LRU (Least Recently Used) strategy, keep your items sorted by access time, when a new item is inserted into the cache and the cache is full you evicted the item that is last in that list. See Cache algorithms at Wikipedia.
If you want to expire immediately, i would still only do that when things are accessed. I.e. when the cache object is accessed and it's time has expired refetch it.

You could also on any change to the cache (re-)start a Timer with the Interval set to the closest expiry timestamp. This will not be accurate to milliseconds and depend on a message pump running, but is not very resource-demanding.
Harald Scheirich's answer is better though, if you don't mind that objects are hanging around forever, when the cache is not updated.

You could clear suitably old items out of the cache on the first access after 1 minute after the last time items were cleared.
private DateTime nextFlush;
public object getItem(object key)
{
DateTime now = DateTime.Now
if (now > nextFlush)
{
Flush();
nextFlush = now.AddMinutes(1)
}
return fetchItem(key);
}

Related

MemoryCache - prevent expiration of items

In my application I use MemoryCache but I don't expect items to expire. Items are therefore inserted to the cache with default policy, without AbsoulteExpiration or SlidingExpiration being set.
Recently, on high server last, I experienced problems with cache, as it returned null values in place of desired values, inserted to the cache before. It turned out, that not only items eligible to expire (as those with expiration date explicitly set) are removed from the cache. Under memory pressure, where values of CacheMemoryLimit and/or PhysicalMemoryLimit are exceeded, MemoryCache removes other elements as well.
How to prevent this? How to be sure, that when element is set to the cache once, it can be safely fetched from it again?
I considered setting the PollingInterval to some huge value, but this only delays the potential problem (and the polling interval is referenced in documentation as maximal time, not the exact or minimal time). Setting PhysicalMemoryLimitPercentage to 100% also does not solve the problem since it references to the physically installed memory and not to the whole available virtual memory. Or am I wrong and it would indeed help?

CacheItemPolicy has a Priority property which can be set to NotRemovable.
You do need to be aware of how much data you are adding to the cache with this setting, though. Continuously adding data to the cache and never removing it will eventually cause memory or overflow issues.
A cache is typically used where it's acceptable for an item to no longer exist in the cache, in which case the value is retrieved again from persistent storage (a database or file, for example).
In your case, it sounds like your code requires the item to exist, which may suggest looking for another approach (a static ConcurrentDictionary as mentioned in the comments, for example).

Can MemoryCache serve expired item till fetching of new data does not succeed?

I am using .NET MemoryCache class for caching purposes.
I have the following requirements:
Expire a cache entry after 'x' amount of time, but don't evict it.
If a request comes for the cache entry after this 'x' amount of time, try fetch new data.
However, if fetching of new data fails, serve the expired cache item (which is still in cache as it has not been evicted).
If the fetching of new data succeeds, then update the item in cache and reset its expiration timer.
Is this possible using MemoryCache?

Is this possible using MemoryCache?
No, I am afraid that the MemoryCache class built into the framework doesn't offer the functionality you are looking for. You might need to implement it yourself. The MemoryCache class offers you a functionality in which when you set the expiration of an item to be x amount of time compared to the current time, after that x amount of time you will not be able to get the cached item because it will be evicted and if you try to get the item before this x amount of time you might succeed or not: the item might still have been evicted out from the cache if your application has been consuming lots of memory for example.
So bottom line is this: if you have placed some item into a MemoryCache there's absolutely no guarantee that you will be able to find this item inside the MemoryCache at a later stage. And by the way that's the whole point of a cache: you should be storing only data data inside this type of cache that you have other means of retrieving if it is not available inside this cache. Usually those other means of retrieving the data consist in making a little bit more expensive database or HTTP remote call.

When is it appropriate to use CacheItemRemovedCallback?

I have a large data set that is updated once a day. I am caching the results of an expensive query on that data but I want to update that cache each day. I am considering using CacheItemRemovedCallback to reload my cache on a daily interval, but I had the following concerns:
Isn't it possible that the CacheItemRemovedCallback could be called before my expiration (in the case of running out of memory)? Which means reloading it immediately doesn't seem like a good idea.
Does the CacheItemRemovedCallback get called before or after the item is actually removed? If it is after, doesn't this theoretically leave a period of time where the cache would be unavailable?
Are these concerns relevant and if using CacheItemRemovedCallback to reload your cache is a bad idea, then when is it useful?

If you're going to reload, be sure to check the CacheItemRemovedReason. I recently had to debug an issue where a developer decided they should immediately re-populate the cache in this method, and under low memory conditions, it basically sat chewing up CPU while it got stuck in a loop of building the cache objects, adding them to the cache, expiring, repeat.
The callback is fired after the item is removed.

From everyone's responses and from further reading I have come to the following conclusion:
My concerns are valid. Using CacheItemRemovedCallback to refresh cached items is not a good idea. The only practical use for this callback seems to be logging information about when your cache is removed.
It seems that CacheItemUpdateCallback is the more appropriate way of refreshing your cache on a regular interval.
Ultimately, I have decided not to use either of these calls. Instead I will write a service action so the database import job can notify my application when it needs to refresh its data. This avoids using a timed refresh altogether.

Yes, there is a change that the method could be fired off for a lot of various reasons. However, loading or waiting to load the cache again would be dependent upon what is best for your typical use case in your application.
CacheItemRemovedCallback does indeed fire after the item is removed from the cache. Right before the item is to be removed, you can use the CacheItemUpateCallback method to determine whether or not you want to flush the cache at that time. There may be good reasons to wait in flushing the cache, such as you currently have users in your application and it takes a long amount of time to build the cache again.
Generally speaking, the best practice is to test that your cached item actually exists in the cache before using its data. If the data doesn't exist, you can rebuild the cache at that time (causing a slightly longer response for the user) or choose to do something else.

This really isn't so much a cache of individual values as it is a snapshot of an entire dataset. As such, you don't benefit from using the Cache class here.
I'd recommend loading a static collection on startup and replacing it every 24 hours by setting a timer. The idea would be to create a new collection and atomically assign it, as the old one may still be in use and we want it to remain self-consistent.

C# HttpRuntime.Cache.Insert() Not holding cached value

I'm trying to cache a price value using HttpRuntime.Cache.Insert(), but only appears to hold the value for a couple hours or something before clearing it out. What am I doing wrong? I want the value to stay in cache for 3 days.
HttpRuntime.Cache.Insert(CacheName, Price, null, DateTime.Now.AddDays(3), TimeSpan.Zero);

Short answer
Your application pool or website is being shutdown too soon. Extend the idle timeout on the site, extend the application pool lifetime for the pool running the site. Raise the memory allocation and request limits.
Full answer
If you want to know when and why something is being removed from the cache, you need to log the item removal using the CacheItemRemovedCallback option on the insertion... Then you can log the reason using the CacheItemRemovedReason argument. You can thus log the reason as one of the four listed reasons:
Removed The item is removed from the cache by a Remove method call or by an Insert method call that specified the same key.
Expired The item is removed from the cache because it expired.
Underused The item is removed from the cache because the system removed it to free memory.
DependencyChanged The item is removed from the cache because the cache dependency associated with it changed.
Typically, you will find Expired and Underused being the reasons for things that don't have explict Remove calls made against the cache and don't have dependencies.
You will likely find out, while tracing through this fun stuff, that your items are not being expired or underused. Rather, I suspect you'll find that the AppDomain is getting unloaded.
One way this can happen due to the web.config (or bin directory, or .aspx, etc.) files getting changed. For more information as to when this occurs see the Application Restarts section of this page. When that happens, the currently pending requests are drained, the cache emptied, and the AppDomain unloaded. You can detect this situation by checking the AppDomain.IsFinalizingForUnload and logging that during the callback.
Another reason for the AppDomain to recycle is when IIS decides to recycle the AppPool for any of the reasons it has been configured with. Examples of that are xxx memory has been allocated over the lifetime, yyy seconds of runtime for the AppPool, ttt scheduled recycle time, or iiii idle time (no requests incoming). For further details check this article for IIS6 or this article for IIS7

The Cache object doesn't guarantee that it will hold onto cached objects at all, much less for the full amount of time that you suggest.
If you want to more strongly encourage it to do so, you can set CacheItemPriority.High or CacheItemPriority.NotRemovable when you insert an item into the Cache. With the default Normal priority, the runtime has a fairly aggressive policy of letting go of objects when memory pressure increases.
On top of that, by default the IIS AppPool will recycle once/day or so, which will clear everything in the Cache.

The docs http://msdn.microsoft.com/en-us/library/4y13wyk9.aspx say that Cache.NoSlidingExpiration must be used if using an absolute expiration.
HttpRuntime.Cache.Insert(CacheName, Price, null, DateTime.Now.AddDays(3), Cache.NoSlidingExpiration);
this may not be your problem though, i just found that Cache.NoSlidingExpiration should be the same as TimeSpan.Zero.
Next i would check that your app pool isnt expiring and check how much cache you are using. If it's a high traffic site using a lot of memory (ie memory cache) then it will expire cache items as the memory is needed for other things.
also check the last comment here http://bytes.com/topic/net/answers/717129-c-asp-net-page-cache-getting-removed-too-soon someone seems to have found a solution to your problem.

Check the recycle time on your App Pool.

By default, items added to the cache have no set expiration, so this is definitely something outside the cache. I agree with Josh, you should check the recycle time on your App Pool.
Check out this page to see an example of how you can add a delegate to let you know exactly when your item is being removed from the cache. This might help you in troubleshooting if it's not your App Pool:
http://msdn.microsoft.com/en-us/library/system.web.caching.cache.add.aspx
~md5sum~

Strategies for Cache Access During Refresh?

I’m looking for some strategies regarding accessing some cached data that resides in a internal company web service. Actually, preventing access of the cached data while the cache is being refreshed.
We have a .Net 3.5 C# web service running on a web farm that maintains a cache of a half-dozen or so datasets. This data is configuration associated items that are referenced by the ‘real’ business logic domain that is also running in this web service as well as being returned for any client uses. Probably talking a total of dozen or so tables with a few thousand records in them.
We implemented a caching mechanism using the MS Enterprise Library 4.1. No huge reason for using this over the ASP.Net cache except that we were already using Enterprise Library for some other things and we liked the cache expiration handling. This is the first time that we have implemented some caching here so maybe I’m missing something fundamental…
This configuration data doesn’t get changed too often – probably a couple of times a day. When this configuration data does change we update the cache on the particular server the update request went to with the new data (the update process goes through the web service). For those other servers in the web farm (currently a total of 3 servers), we have the cache expiration set to 15 minutes upon which the data is re-loaded from the single database that all servers in the farm hit. For our particular purposes, this delay between servers is acceptable (although I guess not ideal).
During this refresh process, other requests could come in that require access to the data. Since the request could come during an expiration/refresh process, there is no data currently in the cache, which obviously causes issues.
What are some strategies to resolve this? If this was going in a single domain sort of WinForm type of application we could hack something up that would prevent access during the refresh by the use of class variables/loops, threading/mutex, or some other singleton-like structure. But I’m leery on implementing something like that running on a web farm. Should I be? Is a distributed server caching mechanism the way to go instead of each server having its own cache? I would like to avoid doing that for now if I could and come up with some coding to get around this problem. Am I missing something?
Thanks for any input.
UPDATE: I was going to use the Lock keyword functionality around the expiration action that subsequently refreshes the data, but I was worried about doing this on a web server. I think that would have worked although it seems to me that there still would be a possibility (although a lesser one) that we could have grabbed data from the empty cache between the time it expired and the time the lock was entered (the expiration action occurs on another thread I think). So what we did was if there was no data in the cache during a regular request for data we assume that it is in the process of being refreshed and just grab the data from the source instead. I think this will work since we can assume that the cache should be filled at all times since the initial cache filling process will occur when the singleton class that holds the cache is created when a web service request is first made. So if the cache is empty it truly means that it is currently being filled, which normally only takes a few seconds so any requests for data from the cache during that time will be the only ones that aren't hitting the cache.
If anyone with experience would like to shed any more light on this, it would be appreciated.

It sounds to me like you are already serving out stale data. So, if that is allowed, why don't you populate a new copy of the cache when you discover its old and only switch to using it once its completely populated.

It really depends on the updating logic. Where is that you decide to update the cache? Can you propagate the update to all the servers in the farm? Then you should lock while updating. If your update process is initiated by a user action, can you let the other servers know that they should expire their cache?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.