I am using Cache in a web service method like this:
var pblDataList = (List<blabla>)HttpContext.Current.Cache.Get("pblDataList");
if (pblDataList == null)
{
var PBLData = dc.ExecuteQuery<blabla>(#"SELECT blabla");
pblDataList = PBLData.ToList();
HttpContext.Current.Cache.Add("pblDataList", pblDataList, null,
DateTime.Now.Add(new TimeSpan(0, 0, 15)),
Cache.NoSlidingExpiration, CacheItemPriority.Normal, null);
}
But I wonder, is this code thread-safe? The web service method is called by multiple requesters. And more then one requester may attempt to retrieve data and add to the Cache at the same time while the cache is empty.
The query takes 5 to 8 seconds. Would introducing a lock statement around this code prevent any possible conflicts? (I know that multiple queries can run simultaneously, but I want to be sure that only one query is running at a time.)
The cache object is thread-safe but HttpContext.Current will not be available from background threads. This may or may not apply to you here, it's not obvious from your code snippet whether or not you are actually using background threads, but in case you are now or decide to at some point in the future, you should keep this in mind.
If there's any chance that you'll need to access the cache from a background thread, then use HttpRuntime.Cache instead.
In addition, although individual operations on the cache are thread-safe, sequential lookup/store operations are obviously not atomic. Whether or not you need them to be atomic depends on your particular application. If it could be a serious problem for the same query to run multiple times, i.e. if it would produce more load than your database is able to handle, or if it would be a problem for a request to return data that is immediately overwritten in the cache, then you would likely want to place a lock around the entire block of code.
However, in most cases you would really want to profile first and see whether or not this is actually a problem. Most web applications/services don't concern themselves with this aspect of caching because they are stateless and it doesn't matter if the cache gets overwritten.
You are correct. The retrieving and adding operations are not being treated as an atomic transaction. If you need to prevent the query from running multiple times, you'll need to use a lock.
(Normally this wouldn't be much of a problem, but in the case of a long running query it can be useful to relieve strain on the database.)
I believe the Add should be thread-safe - i.e. it won't error if Add gets called twice with the same key, but obviously the query might execute twice.
Another question, however, is is the data thread-safe. There is no guarantee that each List<blabla> is isolated - it depends on the cache-provider. The in-memory cache provider stores the objects directly, so there is a risk of collisions if any of the threads edit the data (add/remove/swap items in the list, or change properties of one of the items). However, with a serializing provider you should be fine. Of course, this then demands that blabla is serializable...
Related
I'm trying to reduce the possibility of a race condition if invalidation of my redis cache occurs at the same time I'm retrieving from the cache. Note that invalidation and retrieval happen on two different systems, so I don't know whether this is happening at the same time.
System 1:
InValidateCache() {
_cache.remove(key);
}
System 2:
GetCacheKey() {
string key = _cache.get();
}
Here, key could return the dirty string which has been invalidated in System 1 (since invalidation of cache in System 1 could happen after retrieval of cache in System 2).
How do I make sure that doesn't happen? Is there a retry or another approach I could take to reduce the possibility?
The question lacks the context or usecase for which you need such a solution. However based on the details provided I will try to answer the question.
You could use Mutex based approach to access the cache store and get distributed locks which will prevent the race condition. An overview of the reading/writing operation will look like
Steps:
Acquire the lock
Perform the operation (read | write)
Release the lock
This will be similar to using a Database lock if not exactly the same. You can use the same cache store to acquire the lock. Take a reference from this Ruby gem.
However, I feel it will be an unnecessary layer between the cache store and application unless you want to attach some business logic to the operation. For eg: Two different systems trying to update the cache key at the same time which takes into consideration the previous value of the cache read operation.
In a project of windows services (C# .Net Platform), I need a suggestion.
In the project I have class named Cache, in which I keep some data that I need frequently. There is a thread that updates cache after each 30 minutes. Where as there are multiple threads which use cache data.
In the cache class, there are getter and setter functions which are used by user threads and cache updater thread respectively. No one uses data objects like tables directly, because they are private members.
From the above context, do you think I should use locking functionality in the cache class?
The effects of not using locks when writing to a shared memory location (like cache) really depend on the application. If the code was used in banking software the results could be catastrophic.
As a rule o thumb - when multiple threads access the same location, even if only one tread writes and all the other read, you should use locks (for write operation). What can happen is that one thread will start reading data, get swiped out by the updater thread; So it'll potentially end up using a mixture of old and new data. If that really as an impact depends on the application and how sensible it is.
Key Point: If you don't lock on the reads there's a chance your read won't see the changes. A lock will force your read code to get values from main memory rather than pulling data from a cache or register. To avoid actually locking you could use Thread.MemoryBarrier(), which will do the same job without overhead of actually locking anything.
Minor Points: Using lock would prevent a read from getting half old data and half new data. If you are reading more than one field, I recommend it. If you are really clever, you could keep all the data in an immutable object and return that object to anyone calling the getter and so avoid the need for a lock. (When new data comes in, you create a new immutable object, then replace the old with the new in one go. Use a lock for the actual write, or, if you're still feeling really clever, make the field referencing the object volatile.)
Also: when your getter is called, remember it's running on many other threads. There's a tendency to think that once something is running the Cache class's code it's all on the same thread, and it's not.
I'm trying to improve upon this program that I wrote for work. Initially I was rushed, and they don't care about performance or anything. So, I made a horrible decision to query an entire database(a SQLite database), and then store the results in lists for use in my functions. However, I'm now considering having each of my functions threaded, and having the functions query only the parts of the database that it needs. There are ~25 functions. My question is, is this safe to do? Also, is it possible to have that many concurrent connections? I will only be PULLING information from the database, never inserting or updating.
The way I've had it described to me[*] is to have each concurrent thread open its own connection to the database, as each connection can only process one query or modification at a time. The group of threads with their connections can then perform concurrent reads easily. If you've got a significant problem with many concurrent writes causing excessive blocking or failure to acquire locks, you're getting to the point where you're exceeding what SQLite does for you (and should consider a server-based DB like PostgreSQL).
Note that you can also have a master thread open the connections for the worker threads if that's more convenient, but it's advised (for your sanity's sake if nothing else!) to only actually use each connection from one thread.
[* For a normal build of SQLite. It's possible to switch things off at build time, of course.]
SQLite has no write concurrency, but it supports arbitrarily many connections that read at the same time.
Just ensure that every thread has its own connection.
25 simultanious connections is not a smart idea. That's a huge number.
I usually create a multi-layered design for this problem. I send all requests to the database through a kind of ObjectFactory class that has an internal cache. The ObjectFactory will forward the request to a ConnectionPoolHandler and will store the results in its cache. This connection pool handler uses X simultaneous connections but dispatches them to several threads.
However, some remarks must be made before applying this design. You first have to ask yourself the following 2 questions:
Is your application the only application that has access to this
database?
Is your application the only application that modifies data in this database?
If the first question is negatively, then you could encounter locking issues. If your second question is answered negatively, then it will be extremely difficult to apply caching. You may even prefer not to implement any caching it all.
Caching is especially interesting in case you are often requesting objects based on a unique reference, such as the primary key. In that case you can store the most often used objects in a Map. A popular collection for caching is an "LRUMap" ("Least-Recently-Used" map). The benifit of this collection is that it automatically arranges the most often used objects to the top. At the same time it has a maximum size and automatically removes items from the map that are rarely ever used.
A second advantage of caching is that each object exists only once. For example:
An Employee is fetched from the database.
The ObjectFactory converts the resultset to an actual object instance
The ObjectFactory immediatly stores it in cache.
A bit later, a bunch of employees are fetched using an SQL "... where name like "John%" statement.
Before converting the resultset to objects, the ObjectFactory first checks if the IDs of these records are perhaps already stored in cache.
Found a match ! Aha, this object does not need to be recreated.
There are several advantages to having a certain object only once in memory.
Last but not least in Java there is something like "Weak References". These are references that are references that in fact can be cleaned up by the garbage collector. I am not sure if it exists in C# and how it's called. By implementing this, you don't even have to care about the maximum amount of cached objects, your garbage collector will take care of it.
Before I start, I couldn't find any other resources to answer my question, closest is:
Calling a stored procedure simultaniously from multiple threads in asp.net and sql server 2005
but it fails to answer my specific issue/concern.
Basically, I have a massive .net web app that handles millions of requests a day.
Assume:
All of sprocs concerned are simple get sprocs(ex, SELECT [SOMETHING] FROM [SOMEWHERE] INNER JOIN [SOMETHING ELSE] etc....)
All data never changes(it does change from time to time, for the sake of my scenario, assume it doesn't)
The cache is initially empty for whatever reason.
The method in question:
I check for the existence of the object in the application cache. If it exists, I simply return it. If the object is not in cache, a sproc call is made to the database to look up this data. Once the sproc returns, this data is added to cache and then returned.
Under heavy load I have a bit of a performance issue that I'd like to clear up.
Here's my scenario:
User A comes into this method.
Data is not in cache, sproc gets called.
User B comes into this method(while sproc is still running).
Data is not in cache, sproc gets called.
Rinse and repeat over and over.
Under heavy load, these can generate quite a lot of concurrent and redundant active spids. I'm trying to figure out the best way around this. Obviously I could drop in an sp_getAppLock but the requests would still end up 1) dropping into the sproc and 2) have to fire the exact same query. I could lock on an object that is specific to that exact query and have that wrapped around the cache check. But if I do that, I'm potentially opening the door for some massive thread contention and deadlocking.
I have to assume that someone has dealt with this very scenario before and I'm hopeful there is an appropriate solution. Right now the best solution I can come up with is application locking, but I'd really like to know if anyone has any better options. Perhaps a combination of things, say sql app locks and messaging(traditional or non traditional) where after the lock succeeds, any that were just released try to pull down the result-set(from where?) as opposed to re-executing the entire rest of the sproc.
EDIT:
So follow this.... If I lock or "wait" either the caching or the sproc call, under heavy load it's possible that if an element is not cached and the method(or sproc) that generates the to-be-cached object could end up taking longer than expected. While that is spinning away, threads are going to have to wait. By waiting, the only way(at least that I know) is to lock or spin.
Isn't it then possible to have thread pool exhaustion or lock up all available requests and force the requests to be queued? This is my fear and the thing that drove me to look into moving the layer away from the application and into the database. The last time we attempted to lock around the caching, we suffered from severe CPU spikes on our web box because the threads sat in a lock state for so long. Though I believe at the time we did not use Monitor.Enter/Monitor.Exit(or just lock(){}). Either way, does anyone have any details or experience in this area? I know it's typically bad form to lock on long running processes for this very reason. I would suffer loading duplicate content into cache if I could avoid preventing user requests from dropping into the request queue because I'm all out of threads or all active requests are locked.
Or, maybe it's just late and I'm over thinking this. I had started my day with an almost brilliant, "ah-ha" moment. But now I just keep second guessing myself.
Your cache is most likely protected by a lock, so you are already serializing the threads.
Your suggested solution is the best: have a lock around the query. Once the cache is populated the performance difference will be negligible, and you'll avoid multiple (and expensive) database queries.
In the past I has this problem, when cache was flushes and slow queries take my DB down.
Here some solution for this heavy problem is using Locking, ignore the Hebrew explain and look in the code:
http://blogs.microsoft.co.il/blogs/moshel/archive/2009/07/11/cache.aspx
You may want to look into cache optimization if you haven't done so already.
If you are running through a cachemanager anyway, can it not be made smart enough to know that the proc has already been called and it should wait for it to complete?
GetData() {
if (cached) return cache;
if (caching) {
// wait for it to finish
return cache;
}
caching=true;
cache = CallProc();
cached = true;
caching = false;
}
When a user visits an .aspx page, I need to start some background calculations in a new thread. The results of the calculations need to be stored in the user's Session, so that on a callback, the results can be retrieved. Additionally, on the callback, I need to be able to see what the status of the background calculation is. (E.g. I need to check if the calculation is finished and completed successfully, or if it is still running) How can I accomplish this?
Questions
How would I check on the status of the thread? Multiple users could have background calculations running at the same time, so I'm unsure how the process of knowing which thread belongs to which user would work.. (though in my scenario, the only thread that matters, is the thread originally started by user A -- and user A does a callback to retrieve/check on the status of that thread).
Am I correct in my assumption that passing an HttpSessionState "Session" variable for the user to the new thread, will work as I expect (e.g. I can then add stuff to their Session later).
Thanks. Also I have to say, I might be confused about something but it seems like the SO login system is different now, so I don't have access to my old account.
Edit
I'm now thinking about using the approach described in this article which basically uses a class and a Singleton to manage a list of threads. Instead of storing my data in the database (and incurring the performance penalty associated with retrieving the data, as well as the extra table, maintenance, etc in the database), I'll probably store the data in my class as well.
Edit 2
The approach mentioned in my first edit worked well. Additionally I had timers to ensure the threads, and their associated data, were both cleaned up after the corresponding timers called their cleanup methods. The Objects containing my data and the threads were stored in the Singleton class. For some applications it might be appropriate to use the database for storage but it seemed like overkill for mine, since my data is tied to a specific instance of a page, and is useless outside of that page context.
I would not expect session-state to continue working in this scenario; the worker may have no idea who the user is, and even if it does (or more likely: you capture this data into the worker), no reason to store anything (updating session is a step towards the end of the request pipeline; but if you aren't in the pipeline...?).
I suspect you might need to store this data separately using some unique property of the user (their id or cn), or invent a GUID otherwise. On a single machine it may suffice to store this in a synchronised dictionary (or similar), but on a farm/cluster you may need to push the data down a layer to your database or state server. And fetch manually.