I’m looking for some strategies regarding accessing some cached data that resides in a internal company web service. Actually, preventing access of the cached data while the cache is being refreshed.
We have a .Net 3.5 C# web service running on a web farm that maintains a cache of a half-dozen or so datasets. This data is configuration associated items that are referenced by the ‘real’ business logic domain that is also running in this web service as well as being returned for any client uses. Probably talking a total of dozen or so tables with a few thousand records in them.
We implemented a caching mechanism using the MS Enterprise Library 4.1. No huge reason for using this over the ASP.Net cache except that we were already using Enterprise Library for some other things and we liked the cache expiration handling. This is the first time that we have implemented some caching here so maybe I’m missing something fundamental…
This configuration data doesn’t get changed too often – probably a couple of times a day. When this configuration data does change we update the cache on the particular server the update request went to with the new data (the update process goes through the web service). For those other servers in the web farm (currently a total of 3 servers), we have the cache expiration set to 15 minutes upon which the data is re-loaded from the single database that all servers in the farm hit. For our particular purposes, this delay between servers is acceptable (although I guess not ideal).
During this refresh process, other requests could come in that require access to the data. Since the request could come during an expiration/refresh process, there is no data currently in the cache, which obviously causes issues.
What are some strategies to resolve this? If this was going in a single domain sort of WinForm type of application we could hack something up that would prevent access during the refresh by the use of class variables/loops, threading/mutex, or some other singleton-like structure. But I’m leery on implementing something like that running on a web farm. Should I be? Is a distributed server caching mechanism the way to go instead of each server having its own cache? I would like to avoid doing that for now if I could and come up with some coding to get around this problem. Am I missing something?
Thanks for any input.
UPDATE: I was going to use the Lock keyword functionality around the expiration action that subsequently refreshes the data, but I was worried about doing this on a web server. I think that would have worked although it seems to me that there still would be a possibility (although a lesser one) that we could have grabbed data from the empty cache between the time it expired and the time the lock was entered (the expiration action occurs on another thread I think). So what we did was if there was no data in the cache during a regular request for data we assume that it is in the process of being refreshed and just grab the data from the source instead. I think this will work since we can assume that the cache should be filled at all times since the initial cache filling process will occur when the singleton class that holds the cache is created when a web service request is first made. So if the cache is empty it truly means that it is currently being filled, which normally only takes a few seconds so any requests for data from the cache during that time will be the only ones that aren't hitting the cache.
If anyone with experience would like to shed any more light on this, it would be appreciated.
It sounds to me like you are already serving out stale data. So, if that is allowed, why don't you populate a new copy of the cache when you discover its old and only switch to using it once its completely populated.
It really depends on the updating logic. Where is that you decide to update the cache? Can you propagate the update to all the servers in the farm? Then you should lock while updating. If your update process is initiated by a user action, can you let the other servers know that they should expire their cache?
Related
I am currently using MemoryCache _cache = new MemoryCache(new MemoryCacheOptions()); for caching some data from database that does not change so often, but it does change.
And on create/update/delete of that data I do the refresh of the cache.
This works fine, but the problem is that on production we will have few nodes, so when method for creating of record is called for instance, cache will be refreshed only on that node, not on other nodes, and they will have stale data.
My question is, can I somehow fix this using MemoryCache, or I need to do something else, and if I do, what are the possible solutions?
I think you are looking for is Distributed Caching
Using the IDistributedCache interface you can use either Redis or Sql Server and it supplies basic Get/Set/Remove methods. Changes made on one node will be available to other nodes.
Using Redis is a great way of sharing Session type data between servers in a load balanced environment, Sql Server does not seem to be a great fit given that you seem to be caching to avoid db calls.
It might also be worth considering if you are actually complicating things by caching in the first place. When you have a single application you see the benefit, as keeping them in application memory saves a request over the network, but when you have a load balanced scenario, you have to compare retrieving those records from a distributed cached vs retrieving them from the database.
If the data is just an in memory copy of a relatively small database table, then there is probably not a lot to choose performance wise between the two. If the data is based on a complicated expensive query then the cache is the way to go.
If you are making hundreds of requests a minute for the data, then any network request may be too much, but you can consider what are the consequences of the data being a little stale? For example, if you update a record, and the new record is not available immediately on every server, does your application break? Or does the change just occur in a more phased way? In that case you could keep your in process memory cache, just use a shorter Time To Live.
If you really need every change to propagate to every node straight away then you could consider using a library like Cache Manager in conjunction with Redis which can combine an in memory cache and synchronisation with a remote cache.
Somewhat dated question, but maybe still useful: I agree with what ste-fu said, well explained.
I'll only add that, on top of CacheManager, you may want to take a look at FusionCache ⚡🦥, which I recently released.
On top of supporting an optional distributed 2nd layer transparently managed for you, it also has some other nice features like an optimization that prevents multiple concurrent factory for the same cache key from being executed (less load on the source database), a fail-safe mechanism and advanced timeouts with background factory completion
If you will give it a chance please let me know what you think.
/shameless-plug
Just a bit of background first. I currently have a site hosted with Windows Azure, with multiple instances and also AppFabric as my sole caching provider.
Everything was going great until my traffic spiked earlier this morning. After the instances became overloaded and stopped responding everything came good again once the new instances started.
However I started getting messages from AppFabric saying that I was being throttled because there were too many requests in a given hour. Which is fair enough, it certainly was giving it hell.
In order to avoid these messages in the future I was planning on implementing an InProc cache for very short lifespan. So it checks InProc first, if not goes to AppFabric, if not goes to DB.
ObjectCache cache = MemoryCache.Default;
CacheItemPolicy policy = new CacheItemPolicy();
policy.AbsoluteExpiration = DateTimeOffset.Now.AddMinutes(5);
The questions I have are
Is this the best way to handle the situation?
Is this going to interfere with AppFabric Caching?
Any issues I am overlooking?
Update
I just wanted to say I chose the above method and it works well. I was using it only for general data storage and not session state. MemoryCache with session state would not work too well on Azure due to no server affinity (as mentioned by David below).
Update 16-03-2012
After realizing the obvious I also disabled SessionState on most pages. Most of my pages don't need it and hence this rapidly decreases my calls to cache under heavy load. I also disabled ViewState for most pages as well, just for that slightly quicker page load time.
Are you using cache to provide SessionState storage, or general data storage by your application, or both? It's not totally clear, because InProc usually refers to SessionState, but your sample code does not look like SessionState.
Assuming that you're storing data which can be safely cached locally, then I would recommend looking into AppFabric Local Caching. It does basically what you want, and doesn't require writing any separate code (I think...).
Otherwise, using MemoryCache as you outlined is a workable scheme. I've done this in my apps, you just need to be careful to avoid cache incoherence issues.
Depending on your application, you may also want to implement a per-request cache by storing data in the HttpContext.Items collection. This is helpful when different parts of your code might request the same data during a single request.
Try this: http://msdn.microsoft.com/en-us/magazine/hh708748.aspx
One thing I have done is use HttpContext.Items. This is only a per request cache but depending on the nature of your system can be useful.
I wouldn't suggest inproc, due to the fact there's no server affinity.
One option, with With Windows Azure Cache, to avoid the hourly quota throttling is to bump up cache size. Fortunately the price doesn't scale linearly. For instance: $45 for 128MB, $55 for 256MB. So one option is to bump up your Cache to the next size. You'll need to monitor Compute performance though, via perf counters, as there's no way to monitor cache usage realtime.
Another option is to move session state to SQL Azure, which is now an officially-supported session state provider as of Azure 1.4 (Aug. 2011 - see this article for more info). With the latest SQL Azure pricing updates, if the db stays below 100MB, it's a $4.99 monthly rate instead of the original $9.99 baseline. It's amortized daily, so even if you have transient spikes and go into 1+GB range, you still have quite an affordable cache repository.
Another possible solution would be to use Sticky Sessions like this example:
http://dunnry.com/blog/2010/10/14/StickyHTTPSessionRoutingInWindowsAzure.aspx
I understand that each page refresh, especially in 'AjaxLand', causes my back-end/code-behind class to be called from scratch... This is a problem because my class (which is a member object in System.Web.UI.Page) contains A LOT of data that it sources from a database. So now every page refresh in AjaxLand is causing me to making large backend DB calls, rather than just to reuse a class object from memory. Any fix for this? Is this where session variables come into play? Are session variables the only option I have to retain an object in memory that is linked to a single-user and a single-session instance?
You need ASP.Net Caching.
Specifically Data Caching.
If your data is user-specific then Session would be the way to go. Be careful if you have a web farm or web garden. In which case you'll need a Session server or database for your session.
If your data is application-level then Application Data Cache could be the way to go. Be careful if you have limited RAM and your data is huge. The cache can empty itself at an inopportune moment.
Either way, you'll need to test how your application performs with your changes. You may even find going back to the database to be the least bad option.
In addition, you could have a look at Lazy Loading some of the data, to make it less heavy.
Take a look at this MS article on various caching mechanisms for ASP.NET. There is a section named "Cache arbitrary objects in server memory" that may interest you.
Since you mention Ajax, I think you might want to consider the following points:
Assume this large data set is static and not transient, in the first call to Ajax, your app queries the database, retrieves lots of data and returns to the client (i.e. the browser/JavaScript running on the browser, etc), the client now has all of that in memory already. Subsequently, there's no need to go back to the server for the same data that your client already has in memory. What you need to do is using JavaScript to rebuild the DOM or whatever. All can be done on the client from this point on.
Now assume the data is not static but transient, caching on the server by putting them is the session won't be the solution that you want anyway. Every time your client sends a request to the server, and the server just returns what's in the cache (session), the data is already stale and there's no difference from the data that the client already has in memory.
The point is if the data is static, save round trips to the server once you already have data in memory. If the data is transient, I am afraid there's no cheap solution except re-querying or re-retrieving the data somehow, and send everything back to the client.
At the moment I am working on a project admin application in C# 3.5 on ASP.net. In order to reduce hits to the database, I'm caching a lot of information using static variables. For example, a list of users is kept in memory in a static class. The class reads in all the information from the database on startup, and will update the database whenever changes are made, but it never needs to read from the datebase.
The class pings other webservers (if they exist) with updated information at the same time as a write to the database. The pinging mechanism is a Windows service to which the cache object registers using a random available port. It is used for other things as well.
The amount of data isn't all that great. At the moment I'm using it just to cache the users (password hashes, permissions, name, email etc.) It just saves a pile of calls being made to the database.
I was wondering if there are any pitfalls to this method and/or if there are better ways to cache the data?
A pitfall: A static field is scoped per app domain, and increased load will make the server generate more app domains in the pool. This is not necessarily a problem if you only read from the statics, but you will get duplicate data in memory, and you will get a hit every time an app domain is created or recycled.
Better to use the Cache object - it's intended for things like this.
Edit: Turns out I was wrong about AppDomains (as pointed out in comments) - more instances of the Application will be generated under load, but they will all run in the same AppDomain. (But you should still use the Cache object!)
As long as you can expect that the cache will never grow to a size greater than the amount of available memory, it's fine. Also, be sure that there will only be one instance of this application per database, or the caches in the different instances of the app could "fall out of sync."
Where I work, we have a homegrown O/RM, and we do something similar to what you're doing with certain tables which are not expected to grow or change much. So, what you're doing is not unprecedented, and in fact in our system, is tried and true.
Another Pitfall you must consider is thread safety. All of your application requests are running in the same AppDomain but may come on different threads. Accessing a static variable must account for it being accessed from multiple threads. Probably a bit more overhead than you are looking for. Cache object is better for this purpose.
Hmmm... The "classic" method would be the application cache, but provided you never update the static variables, or understand the locking issues if you do, and you understand that they can disappear at anytime with an appdomain restart then I don't really see the harm in using a static.
I suggest you look into ways of having a distributed cache for your app. You can take a look at NCache or indeXus.Net
The reason I suggested that is because you rolled your own ad-hoc way of updating information that you're caching. Static variables/references are fine but they don't update/refresh (so you'll have to handle aging on your own) and you seem to have a distributed setup.
I'm developing a web service whose methods will be called from a "dynamic banner" that will show a sort of queue of messages read from a sql server table.
The banner will have a heavy pressure in the home pages of high traffic sites; every time the banner will be loaded, it will call my web service, in order to obtain the new queue of messages.
Now: I don't want that all this traffic drives queries to the database every time the banner is loaded, so I'm thinking to use the asp.net cache (i.e. HttpRuntime.Cache[cacheKey]) to limit database accesses; I will try to have a cache refresh every minute or so.
Obviously I'll try have the messages as little as possible, to limit traffic.
But maybe there are other ways to deal with such a scenario; for example I could write the last version of the queue on the file system, and have the web service access that file; or something mixing the two approaches...
The solution is c# web service, asp.net 3.5, sql server 2000.
Any hint? Other approaches?
Thanks
Andrea
It depends on a lot of things:
If there is little change in the data (think backend with "publish" button or daily batches), then I would definitely use static files (updated via push from the backend). We used this solution on a couple of large sites and worked really well.
If the data is small enough, memory caching (i.e. Http Cache) is viable, but beware of locking issues and also beware that Http Cache will not work that well under heavy memory load, because items can be expired early if the framework needs memory. I have been bitten by it before! With the above caveats, Http Cache works quite well.
I think caching is a reasonable approach and you can take it a step further and add a SQL Dependency to it.
ASP.NET Caching: SQL Cache Dependency With SQL Server 2000
If you go the file route, keep this in mind.
http://petesbloggerama.blogspot.com/2008/02/aspnet-writing-files-vs-application.html
Writing a file is a better solution IMHO - its served by IIS kernel code, w/o the huge asp.net overhead and you can copy the file to CDNs later.
AFAIK dependency cashing is not very efficient with SQL Server 2000.
Also, one way to get around the memory limitation mentioned by Skliwz is that if you are using this service outside of the normal application you can isolate it in it's own app pool. I have seen this done before which helps as well.
Thanks all, as the data are little in size, but the underlying tables will change, I think that I'll go the HttpCache way: I need actually a way to reduce db access, even if the data are changing (so that's the reason to not using a direct Sql dependency as suggested by #Bloodhound).
I'll make some stress test before going public, I think.
Thanks again all.
Of course you could (should) also use the caching features in the SixPack library .
Forward (normal) cache, based on HttpCache, which works by putting attributes on your class. Simplest to use, but in some cases you have to wait for the content to be actually be fetched from database.
Pre-fetch cache, from scratch, which, after the first call will start refreshing the cache behind the scenes, and you are guaranteed to have content without wait in some cases.
More info on the SixPack library homepage. Note that the code (especially the forward cache) is load tested.
Here's an example of simple caching:
[Cached]
public class MyTime : ContextBoundObject
{
[CachedMethod(1)]
public DateTime Get()
{
Console.WriteLine("Get invoked.");
return DateTime.Now;
}
}