Curious if anyone has opinions on which method would be better suited for asp.net caching. Option one, have fewer items in the cache which are more complex, or many items which are less complex.
For sake of discussion lets imagine my site has SalesPerson and Customer objects. These are pretty simple classes but I don’t want to be chatty with the database so I want to lazy load them into cache and invalidate them out of the cache when I make a change – simple enough.
Option 1
Create Dictionary and cache the entire dictionary. When I need to load an instance of a SalesPerson from the cache I get out the Dictionary and perform a normal key lookup against the Dictionary.
Option 2
Prefix the key of each item and store it directly in the asp.net cache. For example every SalesPerson instance in the cache would use a composite of the prefix plus the key for that object so it may look like sp_[guid] and is stored in the asp.net cache and also in the cache are the Customer objects with a key like cust_[guid].
One of my fears with option two is that the numbers of entries will grow very large, between SalesPerson, Customer and a dozen or so other categories I might have 25K items in cache and highly repetitive lookups for something like a string resource that I am using in several places might pay a penalty while the code looks through the cache’s key collection to find it amongst the other 25K.
I am sure at some point there is a diminishing return here on storing too many items in the cache but I am curious as to opinions on these matters.
You are best off to create many, smaller items in the cache than to create fewer, larger items. Here is the reasoning:
1) If your data is small, then the number of items in the cache will be relatively small and it won't make any difference. Fetching single entities from the cache is easier than fetching a dictionary and then fetching an item from that dictionary, too.
2) Once your data grows large, the cache may be used to manage the data in an intelligent fashion. The HttpRuntime.Cache object makes use of a Least Recently Used (LRU) algorithm to determine which items in the cache to expire. If you have only a small number of highly used items in the cache, this algorithm will be useless. However, if you have many smaller items in the cache, but 90% of them are not in use at any given moment (very common usage heuristic), then the LRU algorithm can ensure that those items that are seeing active use remain in the cache while evicting less-used items to ensure sufficient room remains for the used ones.
As your application grows, the importance of being able to manage what is in the cache will be most important. Also, I've yet to see any performance degradation from having millions of keys in the cache -- hashtables are extremely fast and if you find issues there it's likely easily solved by altering your naming conventions for your cache keys to optimize them for use as hashtable keys.
The ASP.NET Cache uses its own dictionary so using its dictionary to locate your dictionary to do lookups to retrieve your objects seems less than optimal. Dictionaries use hash tables which is about the most efficient lookup you can do. Using your own dictionaries would just add more overhead, I think. I don't know about diminishing returns in regards to hash tables, but I think it would be in terms of storage size, not lookup time.
I would concern yourself with whatever makes your job easier. If having the Cache more organized will make your app easier to understand, debug, extend and maintain then I would do it. If it makes those things more complex then I would not do it.
And as nullvoid mentioned, this is all assuming you've already explored the larger implications of caching, which involve gauging the performance gains vs. the performance hit. You're talking about storing lots and lots of objects, and this implies lots of cache traffic. I would only store something in the cache that you can measure a performance gain from doing so.
We have built an application that uses Caching for storing all resources. The application is multi-language, so for each label in the application we have at least three translations. We load a (Label,Culture) combination when first needed and then expire it from cache only if it was changed by and admin in the database. This scenario worked perfectly well even when the cache contained 100000 items in it. We only took care to configure the cache and the expiry policies such that we really benefit of the Cache. We use no-expiration, so the items are cached until the worker process is reset or until the item is intentionally expired. We also took care to define a domain for the values of the keys in such a way to uniquely identify a label in a specific culture with the least amount of characters.
I'm going to assume that you've considered all the implications of data changing from multiple users and how that will affect the cached data in terms of handling conflicting data. Caching is really only meant to be done on reletively static data.
From an efficiency perspective I would assume that if you're using the .net serialization properly you're going to benefit from storing the data in the cache in the form of larger typed serialized collections rather than individual base types.
From a maintenance perspective this would also be a better approach, as you can create a strongly typed object to represent the data and use serialization to cast it between the cache and your salesperson/customer object.
Related
I am working on a winforms application where I have to load data from Web API calls. Few million rows of data will be returned and had to be stored in a Dictionary. The logic goes like this. User will click on an item and data would be loaded. If the user clicks on another item, another new dictionary would be created.During the course of time several such heavy weight Dictionary objects would be created. The user might not use the old Dictionary objects after some time. Is this a case for using WeakReference?. Note that recreating any Dictionary object would take 10 to 20 secs. If I opt to keep all the objects in memory, the application performance degrades slowly after some time.
The answer here is to use a more advanced technique.
Use a memory-mapped file to store the dictionaries on disk; then you don't have to worry about holding them all in memory at once as they will be swapped in and out by the OS per demand.
You will want to write a Dictionary designed specifically to operate in the memory mapped file region, and a heap to store things pointed to by the key value pairs in the dictionary. Since you aren't deleting anything, this is actually pretty straightforward.
Otherwise you should take Fildor 4's suggestion and Just Use A Database, as it will basically do everything I just mentioned for you and wrap it up in a nice syntax.
Please excuse my noob question as I am still a junior coder , whats the difference between LRU Caching using Dictionary and Linked list and Memory Caching C#, how would one implement a LRU list on say memory cache.
Thanks in advance.
LRU is a algorithm of expiring the cache and adding new Item to your cache. the this algorithm expires the least recently used Item in your cache when the cache is full.
The MemoryCache is a class in .net 4 and after which is a way of implementing caching inside the heap memory. Caching can be categorise in different ways base on the media that you cache you can cache on hard drive or memory, based on the location of the memory you can categorize it as in-memory (inside the heap memory) and out-memory (a memory out side the heap for example on another server). MemoryCaching in c# uses in-memory and you have to be careful because it can use all the memory of your application. So its better not to use it if you have more than one node.
One another things you have to take into consideration is that you when you cache an object in an out-memory the object should be serializable. But the in-memory caching can cache any object without serializing.
Least-recently-used (LRU) evicts the key-value used-the-least when the cache is full and it needs to add a value. Whereas a MemoryCache evicts the oldest key-values, or those past their 'use-by-date' if they happen to have one.
Say if the first key-value you added is vital and you happened to read all the time, well in a LRU cache it would be kept, but in a memoryCache it would eventually disappear and need to be replaced. Sometimes though older key-values disappearing is what your after, so up-to-date values get pulled through from your backend (e.g. database).
Consider also if adding a existing key-value should be considered as being a 'used' (so recently updated stuff tends to stay around) or if 'used' is only when you read a key-value, so you just favour the things your reader likes. As always I would consider concurrency if you have more than one task or thread using it.
MemoryCache comes with a Default cache by default and additional named caches can be created.
It seems like there might be advantages to isolating the caching of results of different processes in different instances For example, results of queries against an index could be cached in an "IndexQueryResult" cache and result of database queries in a "DatabaseQueryResult" cache. That's rather contrived but explains the principle.
Does the memory pressure on one cache that results in evictions affect the other caches at all? Are there any differences in the way .Net manages multiple caches compared to how it manages one?
Am I wasting my time considering the idea of multiple caches, or is there real value in doing so?
I can't speak to the first few questions, and I'm interested to hear answers to those. However, I can say that we've been had a good experience so far using multiple caches in our product. Here are the benefits I see:
Reduced chance of key collision: Rather than coming up with some kind of scheme to ensure that no two separate values end up with the same key, we can simply create a cache that's specific to a given repository type, and know that as long as that repository class uses keys unique to its objects, we won't have collisions.
Better precision with cache eviction: The repository type that "owns" a particular cache instance can subscribe to certain event types on a system-wide event bus, so that it knows when some parts of the cache need to be purged. If we're lucky, it can determine the keys of the entries to purge purely based on the arguments of the published event. However, this is often not the case, and we must either purge the entire cache or iterate through all the cached values to figure out which ones are affected by the published event. If we were using a single cache instance for all data types in our system, we would end up crawling through a lot of unrelated entries. By using separate caches, we can restrict our search to the values that this particular repository was responsible for populating.
Regarding the second point: we also built a UI to expose all the cache instances in the system, and allow us to purge any of them with the click of a button. This comes in handy when we need to make changes directly to the database, and need the system to pick up those changes without having to restart the server. Again, if we only used a single cache, we couldn't be nearly as precise: we'd have to purge all the cached values systemwide, instead of just the values associated with the data types that we tinkered with.
My website uses linq-to-sql to load about 50k rows of data from a database. This data is static and never changes. It works similar to a spam filter and all 50k rows of patterns need to be loaded.
What's the best way to program this? for optimal performance?
Loading the entire thing into a single static readonly data-structure (it being immutable means once constructed it can be safely used from many threads) would give the greatest overall performance per lookup.
However, it would lead to a long start-up time, which may not be acceptable. In that case you may consider loading each item as accessed, but that brings concurrency issues (since you are mutating a data-structure used by multiple threads).
In between is the option of loading all indices on start-up, and then adding the rest of the information on a per-access basis, with finer-grained locks to reduce lock contention.
Or you can ignore the lot and just load from the database as needed. This does have some advantages in terms of performance just because memory is not used for rarely-used information. It'll be a whole lot easier if you ever suddenly find that you do have to allow the data to change.
No one comes out as the only reasonable way to go in general, it'll depend on specifics of application, data, and usage patterns.
This seems like perhaps a naive question, but I got into a discussion with a co-worker where I argued that there is no real need for a cache to be thread-safe/synchronized as I would assume that it does not matter who is putting in a value, as the value for a given key should be "constant" (in that it is coming from the same source ultimately). If the values can change readily, then the cache itself does not seem to be all the useful (in that if you care that the value is "currently correct" you should go to the original source).
The main reason I see to make at least the GET synchronized is that if it is very expensive to miss in the cache and you don't want multiple threads each going out to get a value to put back in the cache. Even then, you'd need something that actually blocks all consumers during a read-fetch-put cycle.
Anyhow, my working assumption is that a hash is by its very nature thread-safe because for any {key,value} combination, the value is either null or something that it doesn't matter who go there "first" to write.
Question is: Is this a reasonable assumption?
Update: The real scope of my question is around very simple id->value style caches (or {parameters}->{calculated value} where no matter who writes to the cache, the value will be the same and we are just trying to save from "re-calculating"/going back to the database. The actual graph of the object isn't relevant and the cache is generally long-lived.
For most implementations of a hash, you'd need to synchronize. What if the hash table needs to be expanded/rehashed? What if two threads are trying to add something to the hash table where the keys are different, but the hashes collide? They could both be modifying the same slot in the hash table in different ways at the same time. Assuming you're using a hash table to implement your cache (which you imply in your question) I suggest reading a little about the details of how hash tables are implemented if you're not already familiar with this.
Writes aren't always atomic. You must either use atomic data types or provide some synchronization (RCU, locks etc.). No shared data is thread-safe per se. Or make this go away by sticking to lock-free algorithms (that is, where possible and feasible).
As long as the cost for acquiring and releasing a lock is less than the cost for recreating the object (from a file or database or whatever) all accesses to a cache should indeed be synchronized. If it’s not you don’t really need a cache at all. :)
If you want to avoid data corruption, you must synchronize. This is especially true when the cache contains multiple tables that must be updated atomically. Imagine you have a database for a DMV (department of motor vehicles). You add a new person to the database, that person will have records for auto registrations plus records for tickets received for records for home address and perhaps other contact information. If you don't update these tables atomically -- in the database and in the cache -- then any client pulling data out of the cache may get inconsistent data.
Yes, any one piece of data may be constant, but databases very commonly hold data that -- if not updated together and atomically -- can cause database clients to get incorrect or incomplete or inconsistent results.
If you are using Java 5 or above you can use a ConcurrentHashMap. This supports multiple readers and writers in a threadsafe manner.