Single vs multiple MemoryCache instances - c#

MemoryCache comes with a Default cache by default and additional named caches can be created.
It seems like there might be advantages to isolating the caching of results of different processes in different instances For example, results of queries against an index could be cached in an "IndexQueryResult" cache and result of database queries in a "DatabaseQueryResult" cache. That's rather contrived but explains the principle.
Does the memory pressure on one cache that results in evictions affect the other caches at all? Are there any differences in the way .Net manages multiple caches compared to how it manages one?
Am I wasting my time considering the idea of multiple caches, or is there real value in doing so?

I can't speak to the first few questions, and I'm interested to hear answers to those. However, I can say that we've been had a good experience so far using multiple caches in our product. Here are the benefits I see:
Reduced chance of key collision: Rather than coming up with some kind of scheme to ensure that no two separate values end up with the same key, we can simply create a cache that's specific to a given repository type, and know that as long as that repository class uses keys unique to its objects, we won't have collisions.
Better precision with cache eviction: The repository type that "owns" a particular cache instance can subscribe to certain event types on a system-wide event bus, so that it knows when some parts of the cache need to be purged. If we're lucky, it can determine the keys of the entries to purge purely based on the arguments of the published event. However, this is often not the case, and we must either purge the entire cache or iterate through all the cached values to figure out which ones are affected by the published event. If we were using a single cache instance for all data types in our system, we would end up crawling through a lot of unrelated entries. By using separate caches, we can restrict our search to the values that this particular repository was responsible for populating.
Regarding the second point: we also built a UI to expose all the cache instances in the system, and allow us to purge any of them with the click of a button. This comes in handy when we need to make changes directly to the database, and need the system to pick up those changes without having to restart the server. Again, if we only used a single cache, we couldn't be nearly as precise: we'd have to purge all the cached values systemwide, instead of just the values associated with the data types that we tinkered with.

Related

Asp.Net core distributed caching

I am currently using MemoryCache _cache = new MemoryCache(new MemoryCacheOptions()); for caching some data from database that does not change so often, but it does change.
And on create/update/delete of that data I do the refresh of the cache.
This works fine, but the problem is that on production we will have few nodes, so when method for creating of record is called for instance, cache will be refreshed only on that node, not on other nodes, and they will have stale data.
My question is, can I somehow fix this using MemoryCache, or I need to do something else, and if I do, what are the possible solutions?
I think you are looking for is Distributed Caching
Using the IDistributedCache interface you can use either Redis or Sql Server and it supplies basic Get/Set/Remove methods. Changes made on one node will be available to other nodes.
Using Redis is a great way of sharing Session type data between servers in a load balanced environment, Sql Server does not seem to be a great fit given that you seem to be caching to avoid db calls.
It might also be worth considering if you are actually complicating things by caching in the first place. When you have a single application you see the benefit, as keeping them in application memory saves a request over the network, but when you have a load balanced scenario, you have to compare retrieving those records from a distributed cached vs retrieving them from the database.
If the data is just an in memory copy of a relatively small database table, then there is probably not a lot to choose performance wise between the two. If the data is based on a complicated expensive query then the cache is the way to go.
If you are making hundreds of requests a minute for the data, then any network request may be too much, but you can consider what are the consequences of the data being a little stale? For example, if you update a record, and the new record is not available immediately on every server, does your application break? Or does the change just occur in a more phased way? In that case you could keep your in process memory cache, just use a shorter Time To Live.
If you really need every change to propagate to every node straight away then you could consider using a library like Cache Manager in conjunction with Redis which can combine an in memory cache and synchronisation with a remote cache.
Somewhat dated question, but maybe still useful: I agree with what ste-fu said, well explained.
I'll only add that, on top of CacheManager, you may want to take a look at FusionCache ⚡🦥, which I recently released.
On top of supporting an optional distributed 2nd layer transparently managed for you, it also has some other nice features like an optimization that prevents multiple concurrent factory for the same cache key from being executed (less load on the source database), a fail-safe mechanism and advanced timeouts with background factory completion
If you will give it a chance please let me know what you think.
/shameless-plug

Caching large objects for HA with C# asp.net / Net 4.7

I'm trying to cache a large object (around 25MB) that needs to be available for the user for 15 minutes.
In the beginning, I was using MemoryCache (single server) but now that we are going the HA route, we need it to be available to all the servers.
We tried to replace it with Redis, but it takes around 2 minutes (on localhost), between serializing and unserializing the object and the roundtrip (newtonsoft.json serialization).
So, the question is: How do you share large objects that have a short lifespan between servers in a HA?
Thanks for reading :)
I've had good luck switching from JSON to Protobuf ser/de, using the Protobuf-net package. But, it sounds like even if that cut it down to the oft-repeated 6x faster execution time, a 20 second deserialization time probably still won't cut it in this case - since the whole goal is to cache it for a particular user for a "short" period of time.
This sounds like a classic case of eager vs. lazy loading. Since you're already using Redis, have you considered separately caching each property of the object as a separate key? The more numerous the properties, and therefore the smaller each individual one is, the more beneficial this strategy will be. Of course, I'm assuming a fairly orthogonal set of properties on the object - if many of them have dependencies on each other, then this will likely perform worse. But, if the access patterns tend to not require the entire hydrated object, you may improve responsiveness a lot by fetching the demanded individual property instead of the entire object.
I'm assuming a lot about your object - but the simplest step would be implement each property's get accessor to perform the Redis Get call. This has a lot of other downsides regarding dependency management and multi-threaded access, but might be a simple way to achieve a proof of concept.
Keep in mind that this dramatically complicates the cache invalidation requirements. Even if you can store each property individually in Redis, if you then store that value in variable on each machine after fetching, you quickly run into an unmanaged cache situation where you cannot guarantee synchronized data depending on which machine serves the next request.

Is it safe to thread concurrent database queries?

I'm trying to improve upon this program that I wrote for work. Initially I was rushed, and they don't care about performance or anything. So, I made a horrible decision to query an entire database(a SQLite database), and then store the results in lists for use in my functions. However, I'm now considering having each of my functions threaded, and having the functions query only the parts of the database that it needs. There are ~25 functions. My question is, is this safe to do? Also, is it possible to have that many concurrent connections? I will only be PULLING information from the database, never inserting or updating.
The way I've had it described to me[*] is to have each concurrent thread open its own connection to the database, as each connection can only process one query or modification at a time. The group of threads with their connections can then perform concurrent reads easily. If you've got a significant problem with many concurrent writes causing excessive blocking or failure to acquire locks, you're getting to the point where you're exceeding what SQLite does for you (and should consider a server-based DB like PostgreSQL).
Note that you can also have a master thread open the connections for the worker threads if that's more convenient, but it's advised (for your sanity's sake if nothing else!) to only actually use each connection from one thread.
[* For a normal build of SQLite. It's possible to switch things off at build time, of course.]
SQLite has no write concurrency, but it supports arbitrarily many connections that read at the same time.
Just ensure that every thread has its own connection.
25 simultanious connections is not a smart idea. That's a huge number.
I usually create a multi-layered design for this problem. I send all requests to the database through a kind of ObjectFactory class that has an internal cache. The ObjectFactory will forward the request to a ConnectionPoolHandler and will store the results in its cache. This connection pool handler uses X simultaneous connections but dispatches them to several threads.
However, some remarks must be made before applying this design. You first have to ask yourself the following 2 questions:
Is your application the only application that has access to this
database?
Is your application the only application that modifies data in this database?
If the first question is negatively, then you could encounter locking issues. If your second question is answered negatively, then it will be extremely difficult to apply caching. You may even prefer not to implement any caching it all.
Caching is especially interesting in case you are often requesting objects based on a unique reference, such as the primary key. In that case you can store the most often used objects in a Map. A popular collection for caching is an "LRUMap" ("Least-Recently-Used" map). The benifit of this collection is that it automatically arranges the most often used objects to the top. At the same time it has a maximum size and automatically removes items from the map that are rarely ever used.
A second advantage of caching is that each object exists only once. For example:
An Employee is fetched from the database.
The ObjectFactory converts the resultset to an actual object instance
The ObjectFactory immediatly stores it in cache.
A bit later, a bunch of employees are fetched using an SQL "... where name like "John%" statement.
Before converting the resultset to objects, the ObjectFactory first checks if the IDs of these records are perhaps already stored in cache.
Found a match ! Aha, this object does not need to be recreated.
There are several advantages to having a certain object only once in memory.
Last but not least in Java there is something like "Weak References". These are references that are references that in fact can be cleaned up by the garbage collector. I am not sure if it exists in C# and how it's called. By implementing this, you don't even have to care about the maximum amount of cached objects, your garbage collector will take care of it.

Does a cache need to synchronized?

This seems like perhaps a naive question, but I got into a discussion with a co-worker where I argued that there is no real need for a cache to be thread-safe/synchronized as I would assume that it does not matter who is putting in a value, as the value for a given key should be "constant" (in that it is coming from the same source ultimately). If the values can change readily, then the cache itself does not seem to be all the useful (in that if you care that the value is "currently correct" you should go to the original source).
The main reason I see to make at least the GET synchronized is that if it is very expensive to miss in the cache and you don't want multiple threads each going out to get a value to put back in the cache. Even then, you'd need something that actually blocks all consumers during a read-fetch-put cycle.
Anyhow, my working assumption is that a hash is by its very nature thread-safe because for any {key,value} combination, the value is either null or something that it doesn't matter who go there "first" to write.
Question is: Is this a reasonable assumption?
Update: The real scope of my question is around very simple id->value style caches (or {parameters}->{calculated value} where no matter who writes to the cache, the value will be the same and we are just trying to save from "re-calculating"/going back to the database. The actual graph of the object isn't relevant and the cache is generally long-lived.
For most implementations of a hash, you'd need to synchronize. What if the hash table needs to be expanded/rehashed? What if two threads are trying to add something to the hash table where the keys are different, but the hashes collide? They could both be modifying the same slot in the hash table in different ways at the same time. Assuming you're using a hash table to implement your cache (which you imply in your question) I suggest reading a little about the details of how hash tables are implemented if you're not already familiar with this.
Writes aren't always atomic. You must either use atomic data types or provide some synchronization (RCU, locks etc.). No shared data is thread-safe per se. Or make this go away by sticking to lock-free algorithms (that is, where possible and feasible).
As long as the cost for acquiring and releasing a lock is less than the cost for recreating the object (from a file or database or whatever) all accesses to a cache should indeed be synchronized. If it’s not you don’t really need a cache at all. :)
If you want to avoid data corruption, you must synchronize. This is especially true when the cache contains multiple tables that must be updated atomically. Imagine you have a database for a DMV (department of motor vehicles). You add a new person to the database, that person will have records for auto registrations plus records for tickets received for records for home address and perhaps other contact information. If you don't update these tables atomically -- in the database and in the cache -- then any client pulling data out of the cache may get inconsistent data.
Yes, any one piece of data may be constant, but databases very commonly hold data that -- if not updated together and atomically -- can cause database clients to get incorrect or incomplete or inconsistent results.
If you are using Java 5 or above you can use a ConcurrentHashMap. This supports multiple readers and writers in a threadsafe manner.

Efficiency of persistence methods for large asp.net cache store

Curious if anyone has opinions on which method would be better suited for asp.net caching. Option one, have fewer items in the cache which are more complex, or many items which are less complex.
For sake of discussion lets imagine my site has SalesPerson and Customer objects. These are pretty simple classes but I don’t want to be chatty with the database so I want to lazy load them into cache and invalidate them out of the cache when I make a change – simple enough.
Option 1
Create Dictionary and cache the entire dictionary. When I need to load an instance of a SalesPerson from the cache I get out the Dictionary and perform a normal key lookup against the Dictionary.
Option 2
Prefix the key of each item and store it directly in the asp.net cache. For example every SalesPerson instance in the cache would use a composite of the prefix plus the key for that object so it may look like sp_[guid] and is stored in the asp.net cache and also in the cache are the Customer objects with a key like cust_[guid].
One of my fears with option two is that the numbers of entries will grow very large, between SalesPerson, Customer and a dozen or so other categories I might have 25K items in cache and highly repetitive lookups for something like a string resource that I am using in several places might pay a penalty while the code looks through the cache’s key collection to find it amongst the other 25K.
I am sure at some point there is a diminishing return here on storing too many items in the cache but I am curious as to opinions on these matters.
You are best off to create many, smaller items in the cache than to create fewer, larger items. Here is the reasoning:
1) If your data is small, then the number of items in the cache will be relatively small and it won't make any difference. Fetching single entities from the cache is easier than fetching a dictionary and then fetching an item from that dictionary, too.
2) Once your data grows large, the cache may be used to manage the data in an intelligent fashion. The HttpRuntime.Cache object makes use of a Least Recently Used (LRU) algorithm to determine which items in the cache to expire. If you have only a small number of highly used items in the cache, this algorithm will be useless. However, if you have many smaller items in the cache, but 90% of them are not in use at any given moment (very common usage heuristic), then the LRU algorithm can ensure that those items that are seeing active use remain in the cache while evicting less-used items to ensure sufficient room remains for the used ones.
As your application grows, the importance of being able to manage what is in the cache will be most important. Also, I've yet to see any performance degradation from having millions of keys in the cache -- hashtables are extremely fast and if you find issues there it's likely easily solved by altering your naming conventions for your cache keys to optimize them for use as hashtable keys.
The ASP.NET Cache uses its own dictionary so using its dictionary to locate your dictionary to do lookups to retrieve your objects seems less than optimal. Dictionaries use hash tables which is about the most efficient lookup you can do. Using your own dictionaries would just add more overhead, I think. I don't know about diminishing returns in regards to hash tables, but I think it would be in terms of storage size, not lookup time.
I would concern yourself with whatever makes your job easier. If having the Cache more organized will make your app easier to understand, debug, extend and maintain then I would do it. If it makes those things more complex then I would not do it.
And as nullvoid mentioned, this is all assuming you've already explored the larger implications of caching, which involve gauging the performance gains vs. the performance hit. You're talking about storing lots and lots of objects, and this implies lots of cache traffic. I would only store something in the cache that you can measure a performance gain from doing so.
We have built an application that uses Caching for storing all resources. The application is multi-language, so for each label in the application we have at least three translations. We load a (Label,Culture) combination when first needed and then expire it from cache only if it was changed by and admin in the database. This scenario worked perfectly well even when the cache contained 100000 items in it. We only took care to configure the cache and the expiry policies such that we really benefit of the Cache. We use no-expiration, so the items are cached until the worker process is reset or until the item is intentionally expired. We also took care to define a domain for the values of the keys in such a way to uniquely identify a label in a specific culture with the least amount of characters.
I'm going to assume that you've considered all the implications of data changing from multiple users and how that will affect the cached data in terms of handling conflicting data. Caching is really only meant to be done on reletively static data.
From an efficiency perspective I would assume that if you're using the .net serialization properly you're going to benefit from storing the data in the cache in the form of larger typed serialized collections rather than individual base types.
From a maintenance perspective this would also be a better approach, as you can create a strongly typed object to represent the data and use serialization to cast it between the cache and your salesperson/customer object.

Categories

Resources