I have a large data set that is updated once a day. I am caching the results of an expensive query on that data but I want to update that cache each day. I am considering using CacheItemRemovedCallback to reload my cache on a daily interval, but I had the following concerns:
Isn't it possible that the CacheItemRemovedCallback could be called before my expiration (in the case of running out of memory)? Which means reloading it immediately doesn't seem like a good idea.
Does the CacheItemRemovedCallback get called before or after the item is actually removed? If it is after, doesn't this theoretically leave a period of time where the cache would be unavailable?
Are these concerns relevant and if using CacheItemRemovedCallback to reload your cache is a bad idea, then when is it useful?
If you're going to reload, be sure to check the CacheItemRemovedReason. I recently had to debug an issue where a developer decided they should immediately re-populate the cache in this method, and under low memory conditions, it basically sat chewing up CPU while it got stuck in a loop of building the cache objects, adding them to the cache, expiring, repeat.
The callback is fired after the item is removed.
From everyone's responses and from further reading I have come to the following conclusion:
My concerns are valid. Using CacheItemRemovedCallback to refresh cached items is not a good idea. The only practical use for this callback seems to be logging information about when your cache is removed.
It seems that CacheItemUpdateCallback is the more appropriate way of refreshing your cache on a regular interval.
Ultimately, I have decided not to use either of these calls. Instead I will write a service action so the database import job can notify my application when it needs to refresh its data. This avoids using a timed refresh altogether.
Yes, there is a change that the method could be fired off for a lot of various reasons. However, loading or waiting to load the cache again would be dependent upon what is best for your typical use case in your application.
CacheItemRemovedCallback does indeed fire after the item is removed from the cache. Right before the item is to be removed, you can use the CacheItemUpateCallback method to determine whether or not you want to flush the cache at that time. There may be good reasons to wait in flushing the cache, such as you currently have users in your application and it takes a long amount of time to build the cache again.
Generally speaking, the best practice is to test that your cached item actually exists in the cache before using its data. If the data doesn't exist, you can rebuild the cache at that time (causing a slightly longer response for the user) or choose to do something else.
This really isn't so much a cache of individual values as it is a snapshot of an entire dataset. As such, you don't benefit from using the Cache class here.
I'd recommend loading a static collection on startup and replacing it every 24 hours by setting a timer. The idea would be to create a new collection and atomically assign it, as the old one may still be in use and we want it to remain self-consistent.
Related
I am currently using MemoryCache _cache = new MemoryCache(new MemoryCacheOptions()); for caching some data from database that does not change so often, but it does change.
And on create/update/delete of that data I do the refresh of the cache.
This works fine, but the problem is that on production we will have few nodes, so when method for creating of record is called for instance, cache will be refreshed only on that node, not on other nodes, and they will have stale data.
My question is, can I somehow fix this using MemoryCache, or I need to do something else, and if I do, what are the possible solutions?
I think you are looking for is Distributed Caching
Using the IDistributedCache interface you can use either Redis or Sql Server and it supplies basic Get/Set/Remove methods. Changes made on one node will be available to other nodes.
Using Redis is a great way of sharing Session type data between servers in a load balanced environment, Sql Server does not seem to be a great fit given that you seem to be caching to avoid db calls.
It might also be worth considering if you are actually complicating things by caching in the first place. When you have a single application you see the benefit, as keeping them in application memory saves a request over the network, but when you have a load balanced scenario, you have to compare retrieving those records from a distributed cached vs retrieving them from the database.
If the data is just an in memory copy of a relatively small database table, then there is probably not a lot to choose performance wise between the two. If the data is based on a complicated expensive query then the cache is the way to go.
If you are making hundreds of requests a minute for the data, then any network request may be too much, but you can consider what are the consequences of the data being a little stale? For example, if you update a record, and the new record is not available immediately on every server, does your application break? Or does the change just occur in a more phased way? In that case you could keep your in process memory cache, just use a shorter Time To Live.
If you really need every change to propagate to every node straight away then you could consider using a library like Cache Manager in conjunction with Redis which can combine an in memory cache and synchronisation with a remote cache.
Somewhat dated question, but maybe still useful: I agree with what ste-fu said, well explained.
I'll only add that, on top of CacheManager, you may want to take a look at FusionCache ⚡🦥, which I recently released.
On top of supporting an optional distributed 2nd layer transparently managed for you, it also has some other nice features like an optimization that prevents multiple concurrent factory for the same cache key from being executed (less load on the source database), a fail-safe mechanism and advanced timeouts with background factory completion
If you will give it a chance please let me know what you think.
/shameless-plug
I have a C# application that handles about 10.000 immutable objects, each object is of 50KB - 1MB size.
The application picks about 10-100 objects for every operation. Which objects are picked depend on circumstances and user choices, but there are a few that are very frequently used.
Keeping all object in memory all time, is way too much, but disk access time is pressing. I would like to use a popularity-based cache to reduce disk activity. The cache would contain max. 300 objects. I expect that during the usage patterns decides which one should be cached. I can easily add an access counter to each object. The more popular ones get in, less popular ones have to leave the cache. Is there an easy, ingenious way to do that without coding my butt off?
Well you can use System.Runtime.Caching. Cache the objects, which are constantly used, if the cahced objects changes after some time you can specify how much time the cache is Valid. Once the cahce is invalid, in the event handler you can rebuild the cache.
Make sure you use some thread synchronization mechanism when rebuilding cache.
I'd go with WeakReferences: you can build a simple cache manager on top of that in a couple of mins, and let .NET handles the actual memory management by itself.
It may not be the best solution if you need to limit the amount of memory you want your program to use, but otherwise it's definitely worth checking out.
One ready-made solution is to use ASP.NET caching's sliding expiration window.
Sounds like a job for MemCached! This is a free, open-source, high-performance and flexible caching solution. You can download it at http://www.memcached.org.
To get a broad overview, look at the Wikipedia page at https://en.wikipedia.org/wiki/Memcached.
Good luck!
Before I start, I couldn't find any other resources to answer my question, closest is:
Calling a stored procedure simultaniously from multiple threads in asp.net and sql server 2005
but it fails to answer my specific issue/concern.
Basically, I have a massive .net web app that handles millions of requests a day.
Assume:
All of sprocs concerned are simple get sprocs(ex, SELECT [SOMETHING] FROM [SOMEWHERE] INNER JOIN [SOMETHING ELSE] etc....)
All data never changes(it does change from time to time, for the sake of my scenario, assume it doesn't)
The cache is initially empty for whatever reason.
The method in question:
I check for the existence of the object in the application cache. If it exists, I simply return it. If the object is not in cache, a sproc call is made to the database to look up this data. Once the sproc returns, this data is added to cache and then returned.
Under heavy load I have a bit of a performance issue that I'd like to clear up.
Here's my scenario:
User A comes into this method.
Data is not in cache, sproc gets called.
User B comes into this method(while sproc is still running).
Data is not in cache, sproc gets called.
Rinse and repeat over and over.
Under heavy load, these can generate quite a lot of concurrent and redundant active spids. I'm trying to figure out the best way around this. Obviously I could drop in an sp_getAppLock but the requests would still end up 1) dropping into the sproc and 2) have to fire the exact same query. I could lock on an object that is specific to that exact query and have that wrapped around the cache check. But if I do that, I'm potentially opening the door for some massive thread contention and deadlocking.
I have to assume that someone has dealt with this very scenario before and I'm hopeful there is an appropriate solution. Right now the best solution I can come up with is application locking, but I'd really like to know if anyone has any better options. Perhaps a combination of things, say sql app locks and messaging(traditional or non traditional) where after the lock succeeds, any that were just released try to pull down the result-set(from where?) as opposed to re-executing the entire rest of the sproc.
EDIT:
So follow this.... If I lock or "wait" either the caching or the sproc call, under heavy load it's possible that if an element is not cached and the method(or sproc) that generates the to-be-cached object could end up taking longer than expected. While that is spinning away, threads are going to have to wait. By waiting, the only way(at least that I know) is to lock or spin.
Isn't it then possible to have thread pool exhaustion or lock up all available requests and force the requests to be queued? This is my fear and the thing that drove me to look into moving the layer away from the application and into the database. The last time we attempted to lock around the caching, we suffered from severe CPU spikes on our web box because the threads sat in a lock state for so long. Though I believe at the time we did not use Monitor.Enter/Monitor.Exit(or just lock(){}). Either way, does anyone have any details or experience in this area? I know it's typically bad form to lock on long running processes for this very reason. I would suffer loading duplicate content into cache if I could avoid preventing user requests from dropping into the request queue because I'm all out of threads or all active requests are locked.
Or, maybe it's just late and I'm over thinking this. I had started my day with an almost brilliant, "ah-ha" moment. But now I just keep second guessing myself.
Your cache is most likely protected by a lock, so you are already serializing the threads.
Your suggested solution is the best: have a lock around the query. Once the cache is populated the performance difference will be negligible, and you'll avoid multiple (and expensive) database queries.
In the past I has this problem, when cache was flushes and slow queries take my DB down.
Here some solution for this heavy problem is using Locking, ignore the Hebrew explain and look in the code:
http://blogs.microsoft.co.il/blogs/moshel/archive/2009/07/11/cache.aspx
You may want to look into cache optimization if you haven't done so already.
If you are running through a cachemanager anyway, can it not be made smart enough to know that the proc has already been called and it should wait for it to complete?
GetData() {
if (cached) return cache;
if (caching) {
// wait for it to finish
return cache;
}
caching=true;
cache = CallProc();
cached = true;
caching = false;
}
I know that most people recommend using HttpRuntime.Cache because it has more flexibility... etc. But what if you want the object to persist in the cache for the life of the application? Is there any big downside to using the Application[] object to cache things?
As long as you don't abuse the application state, then I don't see a problem in using it for items that you don't want to expire.
Alternatively I would probably use a static variable near the code that uses it. That way you avoid to go through HttpApplicationState and then be forced to have a reference to System.Web if i want to access my data.
But be sure to think through how you use the object(s) that you store in HttpApplicationState. If it's a DataSet which you keep adding stuff to for each request, then at some point you end up eating up too much memory on the web-server. The same could happen if you keep adding items to HttpApplicationState when you process requests, at some point you will force the application to restart.
That's probably the advantage of using Cache in your situation. Consuming larger amounts memory isn't as fatal because you allow ASP.NET to release the items in your cache when memory becomes scarce.
Application is deprecated by Cache. If you need something with application scope, then you should either create it as a static member of a class or use the Cache. If you want to go the Cache route but don't ever want it to expire, you should use the CacheItemPriority.NotRemovable option when you Insert the value into the cache. Note that it is possible to use this priority and still use cache dependencies, for instance if your data depended on something in the file system. All the CacheItemPriority does is prevent the HttpRuntime.Cache from intelligently clearing the item when it feels memory pressure and uses its Least-Recently-Used algorithm to purge items that aren't seeing much use.
Use cache when you want items to automatically expire or get reclaimed when memory is scarse. Otherwise use static variables if you can, because they will yield better performance then digging through the ApplicationState collection. I'm not exactly sure what would be the case when to use ApplicationState, but there are sure to be some.
I've got a nice little class built that acts as a cache. Each item has an expiration TimeSpan or DateTime. Each time an attempt to access an item in the cache is made, the item's expiration is checked, and if it's expired, the item is removed from the cache and nothing is returned.
That's great for objects that are accessed frequently, but if an item is put in the cache and never accessed again, it's never removed, even though it's expired.
What's a good methodology for expiring such items from the cache?
Should I have a background thread infinitely enumerating every item in the cache to check if it's expired?
The best code is no code. Use the ASP.NET cache instead. You can reference it as System.Web.HttpRuntime.Cache in any application, not just web applications.
In my experience, maintaining a custom caching mechanism became more trouble than it was worth. There are several libraries out there that have already solved these problems. I would suggest using one of them. A popular one in .Net is the Enterprise Library, although I have limited experience with its caching abilities.
If you must use a custom caching mechanism, then I see no problem with a watchful thread idea you suggested. That is, if your application is a server-based application and not a web app. If it's a web app, you already have built in sliding expiration. You can then just wrap it in a strongly typed wrapper to avoid referencing cache items by key each time.
You can implement an LRU (Least Recently Used) strategy, keep your items sorted by access time, when a new item is inserted into the cache and the cache is full you evicted the item that is last in that list. See Cache algorithms at Wikipedia.
If you want to expire immediately, i would still only do that when things are accessed. I.e. when the cache object is accessed and it's time has expired refetch it.
You could also on any change to the cache (re-)start a Timer with the Interval set to the closest expiry timestamp. This will not be accurate to milliseconds and depend on a message pump running, but is not very resource-demanding.
Harald Scheirich's answer is better though, if you don't mind that objects are hanging around forever, when the cache is not updated.
You could clear suitably old items out of the cache on the first access after 1 minute after the last time items were cleared.
private DateTime nextFlush;
public object getItem(object key)
{
DateTime now = DateTime.Now
if (now > nextFlush)
{
Flush();
nextFlush = now.AddMinutes(1)
}
return fetchItem(key);
}