Stack Overflow, Redis, and Cache invalidation

Stack Overflow, Redis, and Cache invalidation - c#

Now that Stack Overflow uses redis, do they handle cache invalidation the same way? i.e. a list of identities hashed to a query string + name (I guess the name is some kind of purpose or object type name).
Perhaps they then retrieve individual items that are missing from the cache directly by id (which bypasses a bunch of database indexes and uses the more efficient clustered index instead perhaps). That'd be smart (the rehydration that Jeff mentions?).
Right now, I'm struggling to find a way to pivot all of this in a succinct way. Are there any examples of this kind of thing that I could use to help clarify my thinking prior to doing a first cut myself?
Also, I'm wondering where the cutoff is between using a .net cache (System.Runtime.Caching or System.Web.Caching) and going out and using redis. Or is Redis just hands down faster?
Here's the original SO question from 2009:
https://meta.stackexchange.com/questions/6435/how-does-stackoverflow-handle-cache-invalidation
A couple of other links:
https://meta.stackexchange.com/questions/69164/does-stackoverflow-use-caching-and-if-so-how/69172#69172
https://meta.stackexchange.com/questions/110320/stack-overflow-db-performance-and-redis-cache

I honestly can't decide if this is a SO question or a MSO question, but:
Going off to another system is never faster than querying local memory (as long as it is keyed); simple answer: we use both! So we use:
local memory
else check redis, and update local memory
else fetch from source, and update redis and local memory
This then, as you say, causes an issue of cache invalidation - although actually that isn't critical in most places. But for this - redis events (pub/sub) allow an easy way to broadcast keys that are changing to all nodes, so they can drop their local copy - meaning: next time it is needed we'll pick up the new copy from redis. Hence we broadcast the key-names that are changing against a single event channel name.
Tools: redis on ubuntu server; BookSleeve as a redis wrapper; protobuf-net and GZipStream (enabled / disabled automatically depending on size) for packaging data.
So: the redis pub/sub events are used to invalidate the cache for a given key from one node (the one that knows the state has changed) immediately (pretty much) to all nodes.
Regarding distinct processes (from comments, "do you use any kind of shared memory model for multiple distinct processes feeding off the same data?"): no, we don't do that. Each web-tier box is only really hosting one process (of any given tier), with multi-tenancy within that, so inside the same process we might have 70 sites. For legacy reasons (i.e. "it works and doesn't need fixing") we primarily use the http cache with the site-identity as part of the key.
For the few massively data-intensive parts of the system, we have mechanisms to persist to disk so that the in-memory model can be passed between successive app-domains as the web naturally recycles (or is re-deployed), but that is unrelated to redis.
Here's a related example that shows the broad flavour only of how this might work - spin up a number of instances of the following, and then type some key names in:
static class Program
{
static void Main()
{
const string channelInvalidate = "cache/invalidate";
using(var pub = new RedisConnection("127.0.0.1"))
using(var sub = new RedisSubscriberConnection("127.0.0.1"))
{
pub.Open();
sub.Open();
sub.Subscribe(channelInvalidate, (channel, data) =>
{
string key = Encoding.UTF8.GetString(data);
Console.WriteLine("Invalidated {0}", key);
});
Console.WriteLine(
"Enter a key to invalidate, or an empty line to exit");
string line;
do
{
line = Console.ReadLine();
if(!string.IsNullOrEmpty(line))
{
pub.Publish(channelInvalidate, line);
}
} while (!string.IsNullOrEmpty(line));
}
}
}
What you should see is that when you type a key-name, it is shown immediately in all the running instances, which would then dump their local copy of that key. Obviously in real use the two connections would need to be put somewhere and kept open, so would not be in using statements. We use an almost-a-singleton for this.

Related

Memorycache won't store my object

So I've written a couple of wrapper methods around the System.Runtime MemoryCache, to get a general/user bound cache context per viewmodel in my ASP.NET MVC application.
At some point i noticed that my delegate just keeps getting called every time rather than retrieving my stored object for no apparent reason.
Oddly enough none of my unit tests (which use simple data to check it) failed or showed a pattern explaining that.
Here's one of the wrapper methods:
public T GetCustom<T>(CacheItemPolicy cacheSettings, Func<T> createCallback, params object[] parameters)
{
if (parameters.Length == 0)
throw new ArgumentException("GetCustom can't be called without any parameters.");
lock (_threadLock)
{
var mergedToken = GetCacheSignature(parameters);
var cache = GetMemoryCache();
if (cache.Contains(mergedToken))
{
var cacheResult = cache.Get(mergedToken);
if (cacheResult is T)
return (T)cacheResult;
throw new ArgumentException(string.Format("A caching signature was passed, which duplicates another signature of different return type. ({0})", mergedToken));
}
var result = createCallback(); <!-- keeps landing here
if (!EqualityComparer<T>.Default.Equals(result, default(T)))
{
cache.Add(mergedToken, result, cacheSettings);
}
return result;
}
}
I was wondering if anyone here knows about conditions which render an object invalid for storage within the MemoryCache.
Until then i'll just strip my complex classes' properties until storage works.
Experiences would be interesting nevertheless.

There are couple frequent reasons why it may be happening (assuming correct logic to actually add objects to cache/find correct cache instance):
x86 (32bit) process have "very small" amount of memory to deal with - it is relatively easy to consume too much memory outside the cache (or particular instance of the cache) and as result items will be immediately evicted from the cache.
ASP.Net app domain recycles due to variety of reasons will clear out cache too.
Notes
generally you'd store "per user cached information" in session state so it managed appropriately and can be persisted via SQL/other out-of-process state options.
relying on caching per-user objects may not improve performance if you need to support larger number of users. You need to carefully measure impact on the load level you expect to have.

asp.net shopping cart inventory management in case of multi user environment

is there any need to handle locks in terms of threading in any inventory application.
like as i think asp.net is not thread safe.
lets say that there is a product available and its quantity available is 1 and number of user partially trying to book that particular product are 40. so which is going to get that product. or what happens.
not sure even if the question is reliable or not.
http://blogs.msdn.com/b/benchr/archive/2008/09/03/does-asp-net-magically-handle-thread-safety-for-you.aspx
i am not sure on this please help.

Well, technically, you're not even talking about ASP.NET here, but rather Entity Framework or whatever else you're using to communicate with SQL Server or whatever else persistent data store you're using. Relational databases will typically row-lock, so that as one client is updating the row, the row cannot be read by another client, but you can still run into concurrency issues.
You can handle this situation one of two ways: pessimistic concurrency or optimistic concurrency. With pessimistic concurrency you create locks and any other thread trying to read/write the same data is simply turned away in the mean time. In a multi-threaded environment, it's far more common to use optimistic concurrency, since this allows a bit of play room for failover.
With optimistic concurrency, you version the data. As a simplistic example, let's say that I'm looking for the current stock of widgets in my dbo.Widgets table. I'd have a column like Version which might initially be set to "1" and 100 widgets in my Stock column. Client one wants to buy a widget, so I read the row and note the version, 1. Now, I want to update the column so I do an update to set Stock to 99 and Version to 2, but I include in my where clause Version = 1. But, between the time the row was initially read and the update was sent, another client bought a widget and updated the version of the row to 2. The first client's update fails, because Version is no longer 1. So the application then reads the row fresh and tries to update it again, subtracting 1 from Stock and incrementing Version by 1. Rinse and repeat. Generally, you'll want to have some upward limit of attempts before you'll just give up and return an error to the user, but in most scenarios, you might have one collision and then the next one goes through fine. Your server would have to be getting slammed with people eagerly trying to buy widgets before it would be a real problem.
Now of course, this is a highly simplistic approach, and honestly, not something you really have to manage yourself. Entity Framework, for example, will handle concurrency for you automatically as long as you have a rowversion column:
[Timestamp]
public byte[] RowVersion { get; set; }
See http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc/handling-concurrency-with-the-entity-framework-in-an-asp-net-mvc-application for the full guide to setting it up.

ASP.NET certainly is not Thread Safe. The article you link too is fine as a start, but doesn't tell all the story by a long way. In your case, you likely load the Product List into memory at first request for it, or at Application Startup or some other trigger.
When a Request wants to work with a product you grab the appropriate member of this preloaded list. (Believe me this is better than having every request loading the product or product list from the database.) However, now if you have 40 simultaneous requests for the same product they will all be accessing the same object, and new nasty things can happen, like ending up with -39 stock.
You can address this in a many ways ways, but they boild down to two:
Protect the data somehow
Do what Amazon does
Protect the data
There are numerous ways of doing this. One would be to use a critical section via the Lock keyword on C#. For an example, something like this in the Product Class:
private object lockableThing; // Created in the ctor
public bool ReduceStockLevelForSale(int qtySold)
{
bool success = false;
if (this.quantityOnHand >= qtySold)
{
lock (lockableThing)
{
if (this.quantityOnHand >= qtySold)
{
this.quantityOnHand -= qtySold;
success = true;
}
}
}
return success;
}
The double check on the quantity on hand is deliberate and required. There are any number of ways of doing the equivalent. Books have been written about this sort of thing.
Do what Amazon does
As long as at some point in the Order Taking sequence, Amazon thinks it has enough on hand (or maybe even any) it will let you place the order. It doesn't reduce the stock level while the order is being confirmed. Once the order has been confirmed, it has a back-end process (i.e. NOT run by the Web Site) which checks order by order that the order can be fulfilled, and only reduces the On Hand level if it can. If it can't be, they put the order on hold and send you an email saying 'Sorry! We don't have enough of Product X!' and giving you some options.
Discussion
Amazon's is the best way, because if you decrement the stock from the Web Site at what point do you do it? Probably not until the order is confirmed. If the stock has gone, what do you then do? Also, you are going to have to have some functionality to send the 'Sorry!' email: what happens when the last one (or two or three) items of that product can't be found, don't physically exist or are broken? You send a 'Sorry!' email.
However, this does assume that you are in control of the full order to dispatch cycle which is not always the case. If you aren't in control of the full cycle, you need to adjust to what you are in control of, and then pick a method.

Querying the write model for duplicated aggregate root property

I'm implementing CQRS pattern with Event sourcing, I'm using NServiceBus, NEventStore and NES(Connects between NSB and NEventStore).
My application will check a web service regularly for any file to be downloaded and processed. when the a file is found, a command (DownloadFile) is sent to the bus, and received by FileCommandHandler which creates a new aggregate root (File) and handle the message.
Now inside the (File aggregate root) I have to check that the content of the file doesn't match with any other file content (Since the web service guarantee that only file name is unique, and the content may be duplicated with different name), by hashing it and comparing with the list of hashed contents.
The question is where I have to save the list of hash codes? is it allowed to query the read model?
public class File : AggregateBase
{
public File(DownloadFile cmd, IFileService fileDownloadService, IClaimSerializerService serializerService, IBus bus)
: this()
{
// code to download the file content, deserialize it, and publish an event.
}
}
public class FileCommandHandler : IHandleMessages<DownloadFile>, IHandleMessages<ExtractFile>
{
public void Handle(DownloadFile command)
{
//for example, is it possible to do this (honestly, I feel it is not, since read model should always considered stale !)
var file = readModelContext.GetFileByHashCode (Hash(command.FileContent));
if (file != null)
throw new Exception ("File content matched with another already downloaded file");
// Since there is no way to query the event source for file content like:
// eventSourceRepository.Find<File>(c=>c.HashCode == Hash(command.FileContent));
}
}

Seems like you're looking for deduplication.
Your command side is where you want things to be consistent. Queries will always leave you open to race conditions. So, instead of running a query, I'd reverse the logic and actually write the hash into a database table (any db with ACID guarantees). If this write is successful, process the file. If the write of the hash fails, skip processing.
There's no point putting this logic into a handler, because retrying the message in case of failure (ie storing the hash multiple times) will not make it succeed. You'd also end up with messages for duplicate files in the error q.
A good place for the deduplication logic is likely inside your web service client. Some pseudo logic
Get file
Open transaction
Insert hash into database & catch failure (not any failure, only failure to insert)
Bus.Send message to process file if # of records inserted in step 3 is not zero
commit transaction
Some example deduplication code in NServiceBus gateway here
Edit:
Looking at their code, I actually think the session.Get<DeduplicationMessage> is unnecessary. session.Save(gatewayMessage); should be enough and is the consistency boundary.
Doing a query would make sense only if the rate of failure is high, meaning you have a lot of duplicate content files. If 99%+ of inserts succeed, the duplicates can indeed be treated as exceptions.

This depends on a lot of things ... throughput being one of them. But since you're approaching this problem in a "pull based" fashion anyway (you're querying a webservice to poll for work (downloading and analysing a file)), you could make this whole process serial without having to worry about collisions. Now that might not give the desired rate at which you want to be handling "the work", but more importantly ... have you measured? Let's sidestep that for a minute and assume that serial isn't going to work. How many files are we talking about? A few 100, 1000, ... millions? Depending on that hashes might fit into memory and could be rebuilt if/when the process should come down. There might also be an opportunity to partition your problem along the axis of time or context. Every file since the beginning of dawn or just today, or maybe this month's worth of files? Really, I think you should dig deeper in your problem space. Apart from that, this feels like an awkward problem to solve using event sourcing, but YMMV.

When you have a true uniqueness-constraint in your domain, you can make the uniqueness-tester a domain service, whose implementation is part of the infrastructure -- similar to a repository, whose interface is part of the domain and whose implementation is part of the infrastructure. For the implementation, you can then use an in-memory hash or a database that is updated/queried as needed.

Creating Dynamic Locks at Runtime in ASP.NET

Are the following assumptions valid for this code? I put some background info under the code, but I don't think it's relevant.
Assumption 1: Since this is a single application, I'm making the assumption it will be handled by a single process. Thus, static variables are shared between threads, and declaring my collection of lock objects statically is valid.
Assumption 2: If I know the value is already in the dictionary, I don't need to lock on read. I could use a ConcurrentDictionary, but I believe this one will be safe since I'm not enumerating (or deleting), and the value will exist and not change when I call UnlockOnValue().
Assumption 3: I can lock on the Keys collection, since that reference won't change, even if the underlying data structure does.
private static Dictionary<String,Object> LockList =
new Dictionary<string,object>();
private void LockOnValue(String queryStringValue)
{
lock(LockList.Keys)
{
if(!LockList.Keys.Contains(queryStringValue))
{
LockList.Add(screenName,new Object());
}
System.Threading.Monitor.Enter(LockList[queryStringValue]);
}
}
private void UnlockOnValue(String queryStringValue)
{
System.Threading.Monitor.Exit(LockList[queryStringValue]);
}
Then I would use this code like:
LockOnValue(Request.QueryString["foo"])
//Check cache expiry
//if expired
//Load new values and cache them.
//else
//Load cached values
UnlockOnValue(Request.QueryString["foo"])
Background: I'm creating an app in ASP.NET that downloads data based on a single user-defined variable in the query string. The number of values will be quite limited. I need to cache the results for each value for a specified period of time.
Approach: I decided to use local files to cache the data, which is not the best option, but I wanted to try it since this is non-critical and performance is not a big issue. I used 2 files per option, one with the cache expiry date, and one with the data.
Issue: I'm not sure what the best way to do locking is, and I'm not overly familiar with threading issues in .NET (one of the reasons I chose this approach). Based on what's available, and what I read, I thought the above should work, but I'm not sure and wanted a second opinion.

Your current solution looks pretty good. The two things I would change:
1: UnlockOnValue needs to go in a finally block. If an exception is thrown, it will never release its lock.
2: LockOnValue is somewhat inefficient, since it does a dictionary lookup twice. This isn't a big deal for a small dictionary, but for a larger one you will want to switch to TryGetValue.
Also, your assumption 3 holds - at least for now. But the Dictionary contract makes no guarantee that the Keys property always returns the same object. And since it's so easy to not rely on this, I'd recommend against it. Whenever I need an object to lock on, I just create an object for that sole purpose. Something like:
private static Object _lock = new Object();

lock only has a scope of a single process. If you want to span processes you'll have to use primitives like Mutex (named).
lock is the same as Monitor.Enter and Monitor.Exit. If you also do Monitor.Enter and Monitor.Exit, it's being redundant.
You don't need to lock on read, but you do have to lock the "transaction" of checking if the value doesn't exist and adding it. If you don't lock on that series of instructions, something else could come in between when you check for the key and when you add it and add it--thus resulting in an exception. The lock you're doing is sufficient to do that (you don't need the additional calls to Enter and Exit--lock will do that for you).

Session variables vs local variables

Whenever I have to store anything in the session, I have picked up the habit of minimizing the number of times I have to access the session by doing something like this:
private List<SearchResult> searchResults;
private List<JobSearchResult> SearchResults
{
get
{
return searchResults ?? (searchResults = Session["SearchResults"] as List<SearchResult>);
}
set
{
searchResults = value;
Session["SearchResults"] = value;
}
}
My reasoning being that if the object is used several times throughout a postback, the object has to be retrieved from the Session less often. However, I have absolutely no idea if this actually helps in terms of performance at all, or is in fact just a waste of time, or perhaps even a bad idea. Does anyone know how computationally expensive constantly pulling an object out of the session would be compared to the above approach? Or if there are any best practices surrounding this?

Depends on what kind of session storage you are using (for more info, see: here).
If you're using InProc storage, then the performance difference is probably minimal unless you're accessing the object very frequently. However, the local copy doesn't really hurt either way.

it surely depends on your Storage Unit but it's a good approach in either case since it's preventing you from DeSerialization if the storage is not InProc... and even in case of InProc it's preventing from Boxing\UnBoxing... so my vote is in favour of your approach.

I see nothing wrong with your approach. The only drawback is that when some other piece of your (or somebody else's) code changes the session value after your private field has been initialized, your wrapper property will still return the old value. In other words there is no guarantee your property is actually returning the session value except for the first time.
As for performance, I think in case of InProc there is little or no gain. Probably similar to any other dictionary vs variable storage. However it might make a difference when you use other session storage modes.
And if you really want to know you can profile your app and find out;) You can even try something as simple as 2 trace writes and some looped session reads/writes between them.
And here's a read on session storage internals:
http://www.codeproject.com/KB/session/ASPNETSessionInternals.aspx

It depends on size of data to be stored, bandwidth (internet or LAN), scale of application. If data size is small, bandwidth is good enough (LAN), scale is worldwide (like Whitehouse.gov), we should store it on client side (as form hidden parameter). In other situations (data size is very large, bandwidth is very low, scale is small (only group of 3-4 people will use the application), then we can store it in server side (session). There are a lot of other factors to consider them in this decision choice.
Also I would not recommend you to use only one field in session object. Create something like Dictionary (HashMap in Java) in session, and use it as a storage and user should pass the key of this Dictionary to get this data. It is needed to provide user ability to open your web-site in several tabs.
Example of URL, accessing needed search:
http://www.mysite.com/SearchResult.aspx?search_result=d38e8df908097d46d287f64e67ea6e1a

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.