So I've written a couple of wrapper methods around the System.Runtime MemoryCache, to get a general/user bound cache context per viewmodel in my ASP.NET MVC application.
At some point i noticed that my delegate just keeps getting called every time rather than retrieving my stored object for no apparent reason.
Oddly enough none of my unit tests (which use simple data to check it) failed or showed a pattern explaining that.
Here's one of the wrapper methods:
public T GetCustom<T>(CacheItemPolicy cacheSettings, Func<T> createCallback, params object[] parameters)
{
if (parameters.Length == 0)
throw new ArgumentException("GetCustom can't be called without any parameters.");
lock (_threadLock)
{
var mergedToken = GetCacheSignature(parameters);
var cache = GetMemoryCache();
if (cache.Contains(mergedToken))
{
var cacheResult = cache.Get(mergedToken);
if (cacheResult is T)
return (T)cacheResult;
throw new ArgumentException(string.Format("A caching signature was passed, which duplicates another signature of different return type. ({0})", mergedToken));
}
var result = createCallback(); <!-- keeps landing here
if (!EqualityComparer<T>.Default.Equals(result, default(T)))
{
cache.Add(mergedToken, result, cacheSettings);
}
return result;
}
}
I was wondering if anyone here knows about conditions which render an object invalid for storage within the MemoryCache.
Until then i'll just strip my complex classes' properties until storage works.
Experiences would be interesting nevertheless.
There are couple frequent reasons why it may be happening (assuming correct logic to actually add objects to cache/find correct cache instance):
x86 (32bit) process have "very small" amount of memory to deal with - it is relatively easy to consume too much memory outside the cache (or particular instance of the cache) and as result items will be immediately evicted from the cache.
ASP.Net app domain recycles due to variety of reasons will clear out cache too.
Notes
generally you'd store "per user cached information" in session state so it managed appropriately and can be persisted via SQL/other out-of-process state options.
relying on caching per-user objects may not improve performance if you need to support larger number of users. You need to carefully measure impact on the load level you expect to have.
Related
I've got a requirement to protect my business object properties via a list of separate authorization rules. I want my authorization rules to be suspended during various operations such as converting to DTOs and executing validation rules (validating property values the current user does not have authorization to see).
The approach I'm looking at wraps the calls in a scope object that uses a [ThreadStatic] property to determine whether the authorization rules should be run:
public class SuspendedAuthorizationScope : IDisposable
{
[ThreadStatic]
public static bool AuthorizationRulesAreSuspended;
public SuspendedAuthorizationScope()
{
AuthorizationRulesAreSuspended = true;
}
public void Dispose()
{
AuthorizationRulesAreSuspended = false;
}
}
Here is the IsAuthorized check (from base class):
public bool IsAuthorized(string memberName, AuthorizedAction authorizationAction)
{
if (SuspendedAuthorizationScope.AuthorizationRulesAreSuspended)
return true;
var context = new RulesContext();
_rules.OfType<IAuthorizationRule>()
.Where(r => r.PropertyName == memberName)
.Where(r => r.AuthorizedAction == authorizationAction)
.ToList().ForEach(r => r.Execute(context));
return context.HasNoErrors();
}
Here is the ValidateProperty method demonstrating usage (from the base class):
private void ValidateProperty(string propertyName, IEnumerable<IValidationRule> rules)
{
using (new SuspendedAuthorizationScope())
{
var context = new RulesContext();
rules.ToList().ForEach(rule => rule.Execute(context));
if (HasNoErrors(context))
RemoveErrorsForProperty(propertyName);
else
AddErrorsForProperty(propertyName, context.Results);
}
NotifyErrorsChanged(propertyName);
}
I've got some tests around the scoping object that show that the expected/correct value of SuspendedAuthorizationScope.AuthorizationRulesAreSuspended is used as long as a lambda resolves in the scope of the using statement.
Are there any obvious flaws to this design? Is there anything in ASP.NET that I should be concerned with as far as threading goes?
There are two concerns that I see with your proposed approach:
One's failure to use using when creating SuspendedAuthorizationScope will lead to retaining open access beyond intended scope. In other words, an easy to make mistake will cause security hole (especially thinking in terms of future-proofing your code/design when a new hire starts digging in unknown code and misses this subtle case).
Attaching this magic flag to ThreadStatic now magnifies the previous bullet by having possibility of leaving open access to another page since the thread will be used to process another request after it's done with the current page, and its authorization flag has not been previously reset. So now the scope of authorization lingering longer than it should goes not just beyond missing call to .Dispose(), but actually can leak to another request / page and of completely different user.
That said, the approaches I've seen to solving this problem did involve essentially checking the authorization and marking a magic flag that allowed bypass later on, and then resetting it.
Suggestions:
1. To at least solve the worst variant (#2 above), can you move magic cookie to be a member of your base page class, and have it an instance field that is only valid to the scope of that page and not other instances?
2. To solve all cases, is it possible to use Functors or similar means that you'd pass to authorization function, that would then upon successful authorization will launch your Functor that runs all the logic and then guarantees cleanup? See pseudo code example below:
void myBizLogicFunction()
{
DoActionThatRequiresAuthorization1();
DoActionThatRequiresAuthorization2();
DoActionThatRequiresAuthorization3();
}
void AuthorizeAndRun(string memberName, AuthorizedAction authorizationAction, Func privilegedFunction)
{
if (IsAuthorized(memberName, authorizationAction))
{
try
{
AuthorizationRulesAreSuspended = true;
privilegedFunction();
}
finally
{
AuthorizationRulesAreSuspended = true;
}
}
}
With the above, I think it can be thread static as finally is guaranteed to run, and thus authorization cannot leak beyond call to privilegedFunction. I think this would work, though could use validation and validation by others...
If you have complete control over your code and don't care about hidden dependencies due to magic static value you approach will work. Note that you putting big burden on you/whoever supports your code to make sure there is never asynchronous processing inside your using block and each usage of magic value is wrapped with proper using block.
In general it is bad idea because:
Threads and requests are not tied one-to one so you can run into cases when you thread local object is changing state of some other request. This will even more likely to happen in you use ASP.Net MVC4+ with async handlers.
static values of any kind are code smell and you should try to avoid them.
Storing request related information should be done in HttpContext.Items or maybe Session (also session will last much longer and require more careful management of cleaning up state).
My concern would be about the potential delay between the time you leave your using block and the time it takes the garbage collector to get around to disposing of your object. You may be in a false "authorized" state longer than you intend to be.
Is there a way, using Azure Caching, to determine if an object having a specific key exists in cache, without actually returning the object itself?
I'm currently doing something like
public bool IsKeyInCache(string cacheKey)
{
DataCacheItem item = null;
CacheRetryPolicy.ExecuteAction(() =>
{
item = cache.GetCacheItem(cacheKey);
});
return item != null;
}
But, because the object in cache is very large and expensive to deserialize, performance is horrible.
I've dug through the MSDN documentation and don't see any alternative, but maybe I'm missing something.
My best idea so far is to add a small "marker" object to cache at the same time as my large object and check for existence of the "marker" object where deserialization is inexpensive. But this isn't a robust solution, as it's entirely possible for my large object to get purged from cache while the "marker" object remains.
I believe the API you're looking for is DataCache.Get(key):
windows azure caching - best practice for checking whether a key exist or not
http://msdn.microsoft.com/en-us/library/ff424908%28v=azure.10%29.aspx
But I still think you're best bet is to NOT "micro-manage" the cache, just query, and let Azure decide.
IMHO...
Are the following assumptions valid for this code? I put some background info under the code, but I don't think it's relevant.
Assumption 1: Since this is a single application, I'm making the assumption it will be handled by a single process. Thus, static variables are shared between threads, and declaring my collection of lock objects statically is valid.
Assumption 2: If I know the value is already in the dictionary, I don't need to lock on read. I could use a ConcurrentDictionary, but I believe this one will be safe since I'm not enumerating (or deleting), and the value will exist and not change when I call UnlockOnValue().
Assumption 3: I can lock on the Keys collection, since that reference won't change, even if the underlying data structure does.
private static Dictionary<String,Object> LockList =
new Dictionary<string,object>();
private void LockOnValue(String queryStringValue)
{
lock(LockList.Keys)
{
if(!LockList.Keys.Contains(queryStringValue))
{
LockList.Add(screenName,new Object());
}
System.Threading.Monitor.Enter(LockList[queryStringValue]);
}
}
private void UnlockOnValue(String queryStringValue)
{
System.Threading.Monitor.Exit(LockList[queryStringValue]);
}
Then I would use this code like:
LockOnValue(Request.QueryString["foo"])
//Check cache expiry
//if expired
//Load new values and cache them.
//else
//Load cached values
UnlockOnValue(Request.QueryString["foo"])
Background: I'm creating an app in ASP.NET that downloads data based on a single user-defined variable in the query string. The number of values will be quite limited. I need to cache the results for each value for a specified period of time.
Approach: I decided to use local files to cache the data, which is not the best option, but I wanted to try it since this is non-critical and performance is not a big issue. I used 2 files per option, one with the cache expiry date, and one with the data.
Issue: I'm not sure what the best way to do locking is, and I'm not overly familiar with threading issues in .NET (one of the reasons I chose this approach). Based on what's available, and what I read, I thought the above should work, but I'm not sure and wanted a second opinion.
Your current solution looks pretty good. The two things I would change:
1: UnlockOnValue needs to go in a finally block. If an exception is thrown, it will never release its lock.
2: LockOnValue is somewhat inefficient, since it does a dictionary lookup twice. This isn't a big deal for a small dictionary, but for a larger one you will want to switch to TryGetValue.
Also, your assumption 3 holds - at least for now. But the Dictionary contract makes no guarantee that the Keys property always returns the same object. And since it's so easy to not rely on this, I'd recommend against it. Whenever I need an object to lock on, I just create an object for that sole purpose. Something like:
private static Object _lock = new Object();
lock only has a scope of a single process. If you want to span processes you'll have to use primitives like Mutex (named).
lock is the same as Monitor.Enter and Monitor.Exit. If you also do Monitor.Enter and Monitor.Exit, it's being redundant.
You don't need to lock on read, but you do have to lock the "transaction" of checking if the value doesn't exist and adding it. If you don't lock on that series of instructions, something else could come in between when you check for the key and when you add it and add it--thus resulting in an exception. The lock you're doing is sufficient to do that (you don't need the additional calls to Enter and Exit--lock will do that for you).
Now that Stack Overflow uses redis, do they handle cache invalidation the same way? i.e. a list of identities hashed to a query string + name (I guess the name is some kind of purpose or object type name).
Perhaps they then retrieve individual items that are missing from the cache directly by id (which bypasses a bunch of database indexes and uses the more efficient clustered index instead perhaps). That'd be smart (the rehydration that Jeff mentions?).
Right now, I'm struggling to find a way to pivot all of this in a succinct way. Are there any examples of this kind of thing that I could use to help clarify my thinking prior to doing a first cut myself?
Also, I'm wondering where the cutoff is between using a .net cache (System.Runtime.Caching or System.Web.Caching) and going out and using redis. Or is Redis just hands down faster?
Here's the original SO question from 2009:
https://meta.stackexchange.com/questions/6435/how-does-stackoverflow-handle-cache-invalidation
A couple of other links:
https://meta.stackexchange.com/questions/69164/does-stackoverflow-use-caching-and-if-so-how/69172#69172
https://meta.stackexchange.com/questions/110320/stack-overflow-db-performance-and-redis-cache
I honestly can't decide if this is a SO question or a MSO question, but:
Going off to another system is never faster than querying local memory (as long as it is keyed); simple answer: we use both! So we use:
local memory
else check redis, and update local memory
else fetch from source, and update redis and local memory
This then, as you say, causes an issue of cache invalidation - although actually that isn't critical in most places. But for this - redis events (pub/sub) allow an easy way to broadcast keys that are changing to all nodes, so they can drop their local copy - meaning: next time it is needed we'll pick up the new copy from redis. Hence we broadcast the key-names that are changing against a single event channel name.
Tools: redis on ubuntu server; BookSleeve as a redis wrapper; protobuf-net and GZipStream (enabled / disabled automatically depending on size) for packaging data.
So: the redis pub/sub events are used to invalidate the cache for a given key from one node (the one that knows the state has changed) immediately (pretty much) to all nodes.
Regarding distinct processes (from comments, "do you use any kind of shared memory model for multiple distinct processes feeding off the same data?"): no, we don't do that. Each web-tier box is only really hosting one process (of any given tier), with multi-tenancy within that, so inside the same process we might have 70 sites. For legacy reasons (i.e. "it works and doesn't need fixing") we primarily use the http cache with the site-identity as part of the key.
For the few massively data-intensive parts of the system, we have mechanisms to persist to disk so that the in-memory model can be passed between successive app-domains as the web naturally recycles (or is re-deployed), but that is unrelated to redis.
Here's a related example that shows the broad flavour only of how this might work - spin up a number of instances of the following, and then type some key names in:
static class Program
{
static void Main()
{
const string channelInvalidate = "cache/invalidate";
using(var pub = new RedisConnection("127.0.0.1"))
using(var sub = new RedisSubscriberConnection("127.0.0.1"))
{
pub.Open();
sub.Open();
sub.Subscribe(channelInvalidate, (channel, data) =>
{
string key = Encoding.UTF8.GetString(data);
Console.WriteLine("Invalidated {0}", key);
});
Console.WriteLine(
"Enter a key to invalidate, or an empty line to exit");
string line;
do
{
line = Console.ReadLine();
if(!string.IsNullOrEmpty(line))
{
pub.Publish(channelInvalidate, line);
}
} while (!string.IsNullOrEmpty(line));
}
}
}
What you should see is that when you type a key-name, it is shown immediately in all the running instances, which would then dump their local copy of that key. Obviously in real use the two connections would need to be put somewhere and kept open, so would not be in using statements. We use an almost-a-singleton for this.
Can we avoid casting T to Object when placing it in a Cache?
WeakReference necessitate the use of objects. System.Runtime.Caching.MemoryCache is locked to type object.
Custom Dictionaries / Collections cause issues with the Garbage Collector, or you have to run a Garbage Collector of your own (a seperate thread)?
Is it possible to have the best of both worlds?
I know I accepted an answer already, but using WeakReference is now possible! Looks like they snuck it into .Net 4.
http://msdn.microsoft.com/en-us/library/gg712911(v=VS.96).aspx
an old feature request for the same.
http://connect.microsoft.com/VisualStudio/feedback/details/98270/make-a-generic-form-of-weakreference-weakreference-t-where-t-class
There's nothing to stop you writing a generic wrapper around MemoryCache - probably with a constraint to require reference types:
public class Cache<T> where T : class
{
private readonly MemoryCache cache = new MemoryCache();
public T this[string key]
{
get { return (T) cache[key]; }
set { cache[key] = value; }
}
// etc
}
Obviously it's only worth delegating the parts of MemoryCache you're really interested in.
So you basically want to dependanct inject a cache provider that only returns certain types?
Isn't that kind of against everything OOP?
The idea of the "object" type is that anything and everything is an object so by using a cache that caches instances of "objects" of type object you are saying you can cache anything.
By building a cache that only caches objects of some predetermined type you are limiting the functionality of your cache however ...
There is nothing stopping you implementing a custom cache provider that has a generic constraint so it only allows you cache certain object types, and this in theory would save you about 2 "ticks" (not even a millisecond) per retrieval.
The way to look at this is ...
What's more important to me:
Good OOP based on best practise
about 20 milliseconds over the lifetime of my cache provider
The other thing is ... .net is already geared to optimise the boxing and unboxing process to the extreme and at the end of the day when you "cache" something you are simply putting it somewhere it can be quickly retrieved and storing a pointer to its location for that retrieval later.
I've seen solutions that involve streaming 4GB XML files through a business process that use objects that are destroyed and recreated on every call .. the point is that the process flow was important not so much the initialisation and prep work if that makes sense.
How important is this casting time loss to you?
I would be interested to know more about scenario that requires such speed.
As a side note:
Another thing i've noticed about newer technologies like linq and entity framework is that the result of query is something that is important to cache when the query takes a long time but not so much the side effects on the result.
What this means is that (for example):
If i was to cache a basic "default instance" of an object that uses a complex set of entity queries to create, I wouldn't cache the resulting object but the queries.
With microsoft already doing the ground work i'd ask ... what am i caching and why?