Is there a way, using Azure Caching, to determine if an object having a specific key exists in cache, without actually returning the object itself?
I'm currently doing something like
public bool IsKeyInCache(string cacheKey)
{
DataCacheItem item = null;
CacheRetryPolicy.ExecuteAction(() =>
{
item = cache.GetCacheItem(cacheKey);
});
return item != null;
}
But, because the object in cache is very large and expensive to deserialize, performance is horrible.
I've dug through the MSDN documentation and don't see any alternative, but maybe I'm missing something.
My best idea so far is to add a small "marker" object to cache at the same time as my large object and check for existence of the "marker" object where deserialization is inexpensive. But this isn't a robust solution, as it's entirely possible for my large object to get purged from cache while the "marker" object remains.
I believe the API you're looking for is DataCache.Get(key):
windows azure caching - best practice for checking whether a key exist or not
http://msdn.microsoft.com/en-us/library/ff424908%28v=azure.10%29.aspx
But I still think you're best bet is to NOT "micro-manage" the cache, just query, and let Azure decide.
IMHO...
Related
So I've written a couple of wrapper methods around the System.Runtime MemoryCache, to get a general/user bound cache context per viewmodel in my ASP.NET MVC application.
At some point i noticed that my delegate just keeps getting called every time rather than retrieving my stored object for no apparent reason.
Oddly enough none of my unit tests (which use simple data to check it) failed or showed a pattern explaining that.
Here's one of the wrapper methods:
public T GetCustom<T>(CacheItemPolicy cacheSettings, Func<T> createCallback, params object[] parameters)
{
if (parameters.Length == 0)
throw new ArgumentException("GetCustom can't be called without any parameters.");
lock (_threadLock)
{
var mergedToken = GetCacheSignature(parameters);
var cache = GetMemoryCache();
if (cache.Contains(mergedToken))
{
var cacheResult = cache.Get(mergedToken);
if (cacheResult is T)
return (T)cacheResult;
throw new ArgumentException(string.Format("A caching signature was passed, which duplicates another signature of different return type. ({0})", mergedToken));
}
var result = createCallback(); <!-- keeps landing here
if (!EqualityComparer<T>.Default.Equals(result, default(T)))
{
cache.Add(mergedToken, result, cacheSettings);
}
return result;
}
}
I was wondering if anyone here knows about conditions which render an object invalid for storage within the MemoryCache.
Until then i'll just strip my complex classes' properties until storage works.
Experiences would be interesting nevertheless.
There are couple frequent reasons why it may be happening (assuming correct logic to actually add objects to cache/find correct cache instance):
x86 (32bit) process have "very small" amount of memory to deal with - it is relatively easy to consume too much memory outside the cache (or particular instance of the cache) and as result items will be immediately evicted from the cache.
ASP.Net app domain recycles due to variety of reasons will clear out cache too.
Notes
generally you'd store "per user cached information" in session state so it managed appropriately and can be persisted via SQL/other out-of-process state options.
relying on caching per-user objects may not improve performance if you need to support larger number of users. You need to carefully measure impact on the load level you expect to have.
Now that Stack Overflow uses redis, do they handle cache invalidation the same way? i.e. a list of identities hashed to a query string + name (I guess the name is some kind of purpose or object type name).
Perhaps they then retrieve individual items that are missing from the cache directly by id (which bypasses a bunch of database indexes and uses the more efficient clustered index instead perhaps). That'd be smart (the rehydration that Jeff mentions?).
Right now, I'm struggling to find a way to pivot all of this in a succinct way. Are there any examples of this kind of thing that I could use to help clarify my thinking prior to doing a first cut myself?
Also, I'm wondering where the cutoff is between using a .net cache (System.Runtime.Caching or System.Web.Caching) and going out and using redis. Or is Redis just hands down faster?
Here's the original SO question from 2009:
https://meta.stackexchange.com/questions/6435/how-does-stackoverflow-handle-cache-invalidation
A couple of other links:
https://meta.stackexchange.com/questions/69164/does-stackoverflow-use-caching-and-if-so-how/69172#69172
https://meta.stackexchange.com/questions/110320/stack-overflow-db-performance-and-redis-cache
I honestly can't decide if this is a SO question or a MSO question, but:
Going off to another system is never faster than querying local memory (as long as it is keyed); simple answer: we use both! So we use:
local memory
else check redis, and update local memory
else fetch from source, and update redis and local memory
This then, as you say, causes an issue of cache invalidation - although actually that isn't critical in most places. But for this - redis events (pub/sub) allow an easy way to broadcast keys that are changing to all nodes, so they can drop their local copy - meaning: next time it is needed we'll pick up the new copy from redis. Hence we broadcast the key-names that are changing against a single event channel name.
Tools: redis on ubuntu server; BookSleeve as a redis wrapper; protobuf-net and GZipStream (enabled / disabled automatically depending on size) for packaging data.
So: the redis pub/sub events are used to invalidate the cache for a given key from one node (the one that knows the state has changed) immediately (pretty much) to all nodes.
Regarding distinct processes (from comments, "do you use any kind of shared memory model for multiple distinct processes feeding off the same data?"): no, we don't do that. Each web-tier box is only really hosting one process (of any given tier), with multi-tenancy within that, so inside the same process we might have 70 sites. For legacy reasons (i.e. "it works and doesn't need fixing") we primarily use the http cache with the site-identity as part of the key.
For the few massively data-intensive parts of the system, we have mechanisms to persist to disk so that the in-memory model can be passed between successive app-domains as the web naturally recycles (or is re-deployed), but that is unrelated to redis.
Here's a related example that shows the broad flavour only of how this might work - spin up a number of instances of the following, and then type some key names in:
static class Program
{
static void Main()
{
const string channelInvalidate = "cache/invalidate";
using(var pub = new RedisConnection("127.0.0.1"))
using(var sub = new RedisSubscriberConnection("127.0.0.1"))
{
pub.Open();
sub.Open();
sub.Subscribe(channelInvalidate, (channel, data) =>
{
string key = Encoding.UTF8.GetString(data);
Console.WriteLine("Invalidated {0}", key);
});
Console.WriteLine(
"Enter a key to invalidate, or an empty line to exit");
string line;
do
{
line = Console.ReadLine();
if(!string.IsNullOrEmpty(line))
{
pub.Publish(channelInvalidate, line);
}
} while (!string.IsNullOrEmpty(line));
}
}
}
What you should see is that when you type a key-name, it is shown immediately in all the running instances, which would then dump their local copy of that key. Obviously in real use the two connections would need to be put somewhere and kept open, so would not be in using statements. We use an almost-a-singleton for this.
Can we avoid casting T to Object when placing it in a Cache?
WeakReference necessitate the use of objects. System.Runtime.Caching.MemoryCache is locked to type object.
Custom Dictionaries / Collections cause issues with the Garbage Collector, or you have to run a Garbage Collector of your own (a seperate thread)?
Is it possible to have the best of both worlds?
I know I accepted an answer already, but using WeakReference is now possible! Looks like they snuck it into .Net 4.
http://msdn.microsoft.com/en-us/library/gg712911(v=VS.96).aspx
an old feature request for the same.
http://connect.microsoft.com/VisualStudio/feedback/details/98270/make-a-generic-form-of-weakreference-weakreference-t-where-t-class
There's nothing to stop you writing a generic wrapper around MemoryCache - probably with a constraint to require reference types:
public class Cache<T> where T : class
{
private readonly MemoryCache cache = new MemoryCache();
public T this[string key]
{
get { return (T) cache[key]; }
set { cache[key] = value; }
}
// etc
}
Obviously it's only worth delegating the parts of MemoryCache you're really interested in.
So you basically want to dependanct inject a cache provider that only returns certain types?
Isn't that kind of against everything OOP?
The idea of the "object" type is that anything and everything is an object so by using a cache that caches instances of "objects" of type object you are saying you can cache anything.
By building a cache that only caches objects of some predetermined type you are limiting the functionality of your cache however ...
There is nothing stopping you implementing a custom cache provider that has a generic constraint so it only allows you cache certain object types, and this in theory would save you about 2 "ticks" (not even a millisecond) per retrieval.
The way to look at this is ...
What's more important to me:
Good OOP based on best practise
about 20 milliseconds over the lifetime of my cache provider
The other thing is ... .net is already geared to optimise the boxing and unboxing process to the extreme and at the end of the day when you "cache" something you are simply putting it somewhere it can be quickly retrieved and storing a pointer to its location for that retrieval later.
I've seen solutions that involve streaming 4GB XML files through a business process that use objects that are destroyed and recreated on every call .. the point is that the process flow was important not so much the initialisation and prep work if that makes sense.
How important is this casting time loss to you?
I would be interested to know more about scenario that requires such speed.
As a side note:
Another thing i've noticed about newer technologies like linq and entity framework is that the result of query is something that is important to cache when the query takes a long time but not so much the side effects on the result.
What this means is that (for example):
If i was to cache a basic "default instance" of an object that uses a complex set of entity queries to create, I wouldn't cache the resulting object but the queries.
With microsoft already doing the ground work i'd ask ... what am i caching and why?
Whenever I have to store anything in the session, I have picked up the habit of minimizing the number of times I have to access the session by doing something like this:
private List<SearchResult> searchResults;
private List<JobSearchResult> SearchResults
{
get
{
return searchResults ?? (searchResults = Session["SearchResults"] as List<SearchResult>);
}
set
{
searchResults = value;
Session["SearchResults"] = value;
}
}
My reasoning being that if the object is used several times throughout a postback, the object has to be retrieved from the Session less often. However, I have absolutely no idea if this actually helps in terms of performance at all, or is in fact just a waste of time, or perhaps even a bad idea. Does anyone know how computationally expensive constantly pulling an object out of the session would be compared to the above approach? Or if there are any best practices surrounding this?
Depends on what kind of session storage you are using (for more info, see: here).
If you're using InProc storage, then the performance difference is probably minimal unless you're accessing the object very frequently. However, the local copy doesn't really hurt either way.
it surely depends on your Storage Unit but it's a good approach in either case since it's preventing you from DeSerialization if the storage is not InProc... and even in case of InProc it's preventing from Boxing\UnBoxing... so my vote is in favour of your approach.
I see nothing wrong with your approach. The only drawback is that when some other piece of your (or somebody else's) code changes the session value after your private field has been initialized, your wrapper property will still return the old value. In other words there is no guarantee your property is actually returning the session value except for the first time.
As for performance, I think in case of InProc there is little or no gain. Probably similar to any other dictionary vs variable storage. However it might make a difference when you use other session storage modes.
And if you really want to know you can profile your app and find out;) You can even try something as simple as 2 trace writes and some looped session reads/writes between them.
And here's a read on session storage internals:
http://www.codeproject.com/KB/session/ASPNETSessionInternals.aspx
It depends on size of data to be stored, bandwidth (internet or LAN), scale of application. If data size is small, bandwidth is good enough (LAN), scale is worldwide (like Whitehouse.gov), we should store it on client side (as form hidden parameter). In other situations (data size is very large, bandwidth is very low, scale is small (only group of 3-4 people will use the application), then we can store it in server side (session). There are a lot of other factors to consider them in this decision choice.
Also I would not recommend you to use only one field in session object. Create something like Dictionary (HashMap in Java) in session, and use it as a storage and user should pass the key of this Dictionary to get this data. It is needed to provide user ability to open your web-site in several tabs.
Example of URL, accessing needed search:
http://www.mysite.com/SearchResult.aspx?search_result=d38e8df908097d46d287f64e67ea6e1a
During Application_End() in Global.aspx, HttpContext.Current is null. I still want to be able to access cache - it's in memory, so want to see if I can reference it somehow to save bits to disk.
Question - is there a way to reference cache in memory somehow when HttpContext.Current is null?
Perhaps I could create a global static variable that would store pointer to cache that I could update on HTTP requests (pseudo: "static <pointer X>" = HttpRequest.Current) and retrieve a reference to cache through that pointer in Application_End()?
Is there a better way to access Cache in memory when there is no Http Request is made?
You should be able to access it via HttpRuntime.Cache
http://www.hanselman.com/blog/UsingTheASPNETCacheOutsideOfASPNET.aspx
According to Scott - looking at Reflector HttpContext.Current.Cache just calls HttpRuntime.Cache - so you might as well always access it this way.
I'm using the following getter to return a System.Web.Caching.Cache object which is working for me.
get
{
return (System.Web.HttpContext.Current == null)
? System.Web.HttpRuntime.Cache
: System.Web.HttpContext.Current.Cache;
}
Which basically backs up James Gaunt but of course is only going to help get the cache in Application End.
Edit: I probably got this from one of the comments on the Scott Hanselman blog James linked to!
Inside Application_End event all cache objects are already disposed.
If you want to get access to cache object before it will be disposed, you need to use somethink like this to add object to cache:
Import namespace System.Web.Caching to your application where you are using adding objects to cache.
//Add callback method to delegate
var onRemove = new CacheItemRemovedCallback(RemovedCallback);
//Insert object to cache
HttpContext.Current.Cache.Insert("YourKey", YourValue, null, DateTime.Now.AddHours(12), Cache.NoSlidingExpiration, CacheItemPriority.NotRemovable, onRemove);
And when this object is going to be disposed will be called following method:
private void RemovedCallback(string key, object value, CacheItemRemovedReason reason)
{
//Use your logic here
//After this method cache object will be disposed
}
Please let me know if this approach is not right for you.
Hope it will help you with your question.
Best regards, Dima.