I'm new in C# and trying to understand how to work with Lazy.
I need to handle concurrent request by waiting the result of an already running operation. Requests for data may come in simultaneously with same/different credentials.
For each unique set of credentials there can be at most one GetDataInternal call in progress, with the result from that one call returned to all queued waiters when it is ready
private readonly ConcurrentDictionary<Credential, Lazy<Data>> Cache
= new ConcurrentDictionary<Credential, Lazy<Data>>();
public Data GetData(Credential credential)
{
// This instance will be thrown away if a cached
// value with our "credential" key already exists.
Lazy<Data> newLazy = new Lazy<Data>(
() => GetDataInternal(credential),
LazyThreadSafetyMode.ExecutionAndPublication
);
Lazy<Data> lazy = Cache.GetOrAdd(credential, newLazy);
bool added = ReferenceEquals(newLazy, lazy); // If true, we won the race.
Data data;
try
{
// Wait for the GetDataInternal call to complete.
data = lazy.Value;
}
finally
{
// Only the thread which created the cache value
// is allowed to remove it, to prevent races.
if (added) {
Cache.TryRemove(credential, out lazy);
}
}
return data;
}
Is that right way to use Lazy or my code is not safe?
Update:
Is it good idea to start using MemoryCache instead of ConcurrentDictionary? If yes, how to create a key value, because it's a string inside MemoryCache.Default.AddOrGetExisting()
This is correct. This is a standard pattern (except for the removal) and it's a really good cache because it prevents cache stampeding.
I'm not sure you want to remove from the cache when the computation is done because the computation will be redone over and over that way. If you don't need the removal you can simplify the code by basically deleting the second half.
Note, that Lazy has a problem in the case of an exception: The exception is stored and the factory will never be re-executed. The problem persists forever (until a human restarts the app). In my mind this makes Lazy completely unsuitable for production use in most cases.
This means that a transient error such as a network issue can render the app unavailable permanently.
This answer is directed to the updated part of the original question. See #usr answer regarding thread-safety with Lazy<T> and the potential pitfalls.
I would like to know how to avoid using ConcurrentDictionary<TKey, TValue> and start
using MemoryCache? How to implement
MemoryCache.Default.AddOrGetExisting()?
If you're looking for a cache which has a mechanism for auto expiry, then MemoryCache is a good choice if you don't want to implement the mechanics yourself.
In order to utilize MemoryCache which forces a string representation for a key, you'll need to create a unique string representation of a credential, perhaps a given user id or a unique username?
If you can, you can create an override of ToString which represents your unique identifier or simply use the said property, and utilize MemoryCache like this:
public class Credential
{
public Credential(int userId)
{
UserId = userId;
}
public int UserId { get; private set; }
}
And now your method will look like this:
private const EvictionIntervalMinutes = 10;
public Data GetData(Credential credential)
{
Lazy<Data> newLazy = new Lazy<Data>(
() => GetDataInternal(credential), LazyThreadSafetyMode.ExecutionAndPublication);
CacheItemPolicy evictionPolicy = new CacheItemPolicy
{
AbsoluteExpiration = DateTimeOffset.UtcNow.AddMinutes(EvictionIntervalMinutes)
};
var result = MemoryCache.Default.AddOrGetExisting(
new CacheItem(credential.UserId.ToString(), newLazy), evictionPolicy);
return result != null ? ((Lazy<Data>)result.Value).Value : newLazy.Value;
}
MemoryCache provides you with a thread-safe implementation, this means that two threads accessing AddOrGetExisting will only cause a single cache item to be added or retrieved. Further, Lazy<T> with ExecutionAndPublication guarantess only a single unique invocation of the factory method.
Related
I have a controller which returns a large json object. If this object does not exist, it will generate and return it afterwards. The generation takes about 5 seconds, and if the client sent the request multiple times, the object gets generated with x-times the children. So my question is: Is there a way to block the second request, until the first one finished, independent who sent the request?
Normally I would do it with a Singleton, but because I am having scoped services, singleton does not work here
Warning: this is very oppinionated and maybe not suitable for Stack Overflow, but here it is anyway
Although I'll provide no code... when things take a while to generate, you don't usually spend that time directly in controller code, but do something like "start a background task to generate the result, and provide a "task id", which can be queried on another different call).
So, my preferred course of action for this would be having two different controller actions:
Generate, which creates the background job, assigns it some id, and returns the id
GetResult, to which you pass the task id, and returns either different error codes for "job id doesn't exist", "job id isn't finished", or a 200 with the result.
This way, your clients will need to call both, however, in Generate, you can check if the job is already being created and return an existing job id.
This of course moves the need to "retry and check" to your client: in exchange, you don't leave the connection to the server opened during those 5 seconds (which could potentially be multiplied by a number of clients) and return fast.
Otherwise, if you don't care about having your clients wait for a response during those 5 seconds, you could do a simple:
if(resultDoesntExist) {
resultDoesntExist = false; // You can use locks for the boolean setters or Interlocked instead of just setting a member
resultIsBeingGenerated = true;
generateResult(); // <-- this is what takes 5 seconds
resultIsBeingGenerated = false;
}
while(resultIsBeingGenerated) { await Task.Delay(10); } // <-- other clients will wait here
var result = getResult(); // <-- this should be fast once the result is already created
return result;
note: those booleans and the actual loop could be on the controller, or on the service, or wherever you see fit: just be wary of making them thread-safe in however method you see appropriate
So you basically make other clients wait till the first one generates the result, with "almost" no CPU load on the server... however with a connection open and a thread from the threadpool used, so I just DO NOT recommend this :-)
PS: #Leaky solution above is also good, but it also shifts the responsability to retry to the client, and if you are going to do that, I'd probably go directly with a "background job id", instead of having the first (the one that generates the result) one take 5 seconds. IMO, if it can be avoided, no API action should ever take 5 seconds to return :-)
Do you have an example for Interlocked.CompareExchange?
Sure. I'm definitely not the most knowledgeable person when it comes to multi-threading stuff, but this is quite simple (as you might know, Interlocked has no support for bool, so it's customary to represent it with an integral type):
public class QueryStatus
{
private static int _flag;
// Returns false if the query has already started.
public bool TrySetStarted()
=> Interlocked.CompareExchange(ref _flag, 1, 0) == 0;
public void SetFinished()
=> Interlocked.Exchange(ref _flag, 0);
}
I think it's the safest if you use it like this, with a 'Try' method, which tries to set the value and tells you if it was already set, in an atomic way.
Besides simply adding this (I mean just the field and the methods) to your existing component, you can also use it as a separate component, injected from the IOC container as scoped. Or even injected as a singleton, and then you don't have to use a static field.
Storing state like this should be good for as long as the application is running, but if the hosted application is recycled due to inactivity, it's obviously lost. Though, that won't happen while a request is still processing, and definitely won't happen in 5 seconds.
(And if you wanted to synchronize between app service instances, you could 'quickly' save a flag to the database, in a transaction with proper isolation level set. Or use e.g. Azure Redis Cache.)
Example solution
As Kit noted, rightly so, I didn't provide a full solution above.
So, a crude implementation could go like this:
public class SomeQueryService : ISomeQueryService
{
private static int _hasStartedFlag;
private static bool TrySetStarted()
=> Interlocked.CompareExchange(ref _hasStartedFlag, 1, 0) == 0;
private static void SetFinished()
=> Interlocked.Exchange(ref _hasStartedFlag, 0);
public async Task<(bool couldExecute, object result)> TryExecute()
{
if (!TrySetStarted())
return (couldExecute: false, result: null);
// Safely execute long query.
SetFinished();
return (couldExecute: true, result: result);
}
}
// In the controller, obviously
[HttpGet()]
public async Task<IActionResult> DoLongQuery([FromServices] ISomeQueryService someQueryService)
{
var (couldExecute, result) = await someQueryService.TryExecute();
if (!couldExecute)
{
return new ObjectResult(new ProblemDetails
{
Status = StatusCodes.Status503ServiceUnavailable,
Title = "Another request has already started. Try again later.",
Type = "https://tools.ietf.org/html/rfc7231#section-6.6.4"
})
{ StatusCode = StatusCodes.Status503ServiceUnavailable };
}
return Ok(result);
}
Of course possibly you'd want to extract the 'blocking' logic from the controller action into somewhere else, for example an action filter. In that case the flag should also go into a separate component that could be shared between the query service and the filter.
General use action filter
I felt bad about my inelegant solution above, and I realized that this problem can be generalized into basically a connection number limiter on an endpoint.
I wrote this small action filter that can be applied to any endpoint (multiple endpoints), and it accepts the number of allowed connections:
[AttributeUsage(AttributeTargets.Method, AllowMultiple = false)]
public class ConcurrencyLimiterAttribute : ActionFilterAttribute
{
private readonly int _allowedConnections;
private static readonly ConcurrentDictionary<string, int> _connections = new ConcurrentDictionary<string, int>();
public ConcurrencyLimiterAttribute(int allowedConnections = 1)
=> _allowedConnections = allowedConnections;
public override async Task OnActionExecutionAsync(ActionExecutingContext context, ActionExecutionDelegate next)
{
var key = context.HttpContext.Request.Path;
if (_connections.AddOrUpdate(key, 1, (k, v) => ++v) > _allowedConnections)
{
Close(withError: true);
return;
}
try
{
await next();
}
finally
{
Close();
}
void Close(bool withError = false)
{
if (withError)
{
context.Result = new ObjectResult(new ProblemDetails
{
Status = StatusCodes.Status503ServiceUnavailable,
Title = $"Maximum {_allowedConnections} simultaneous connections are allowed. Try again later.",
Type = "https://tools.ietf.org/html/rfc7231#section-6.6.4"
})
{ StatusCode = StatusCodes.Status503ServiceUnavailable };
}
_connections.AddOrUpdate(key, 0, (k, v) => --v);
}
}
}
Been having some problems with a web based .Net(C#) application. I'm using the LazyCache library to cache frequent JSON responses (some in & around 80+KB) for users belonging to the same company across user sessions.
One of the things we need to do is to keep track of the cache keys for a particular company so when any user in the company makes mutating changes to items being cached we need to clear the cache for those items for that particular company's users to force the cache to be repopulated immediately upon the receiving the next request.
We choose LazyCache library as we wanted to do this in memory without needing to use an external cache source such as Redis etc as we don't have heavy usage.
One of the problems we have using this approach is we need to keep track of all the cache keys belonging to a particular customer anytime we cache. So when any mutating change is made by company user's to the relevant resource we need to expire all the cache keys belonging to that company.
To achieve this we have a global cache which all web controllers have access to.
private readonly IAppCache _cache = new CachingService();
protected IAppCache GetCache()
{
return _cache;
}
A simplified example (forgive any typos!) of our controllers which use this cache would be something like below
[HttpGet]
[Route("{customerId}/accounts/users")]
public async Task<Users> GetUsers([Required]string customerId)
{
var usersBusinessLogic = await _provider.GetUsersBusinessLogic(customerId)
var newCacheKey= "GetUsers." + customerId;
CacheUtil.StoreCacheKey(customerId,newCacheKey)
return await GetCache().GetOrAddAsync(newCacheKey, () => usersBusinessLogic.GetUsers(), DateTimeOffset.Now.AddMinutes(10));
}
We use a util class with static methods and a static concurrent dictionary to store the cache keys - each company (GUID) can have many cache keys.
private static readonly ConcurrentDictionary<Guid, ConcurrentHashSet<string>> cacheKeys = new ConcurrentDictionary<Guid, ConcurrentHashSet<string>>();
public static void StoreCacheKey(Guid customerId, string newCacheKey)
{
cacheKeys.AddOrUpdate(customerId, new ConcurrentHashSet<string>() { newCacheKey }, (key, existingCacheKeys) =>
{
existingCacheKeys.Add(newCacheKey);
return existingCacheKeys;
});
}
Within that same util class when we need to remove all cache keys for a particular company we have a method similar to below (which is caused when mutating changes are made in other controllers)
public static void ClearCustomerCache(IAppCache cache, Guid customerId)
{
var customerCacheKeys = new ConcurrentHashSet<string>();
if (!cacheKeys.TryGetValue(customerId,out customerCacheKeys))
{
return new ConcurrentHashSet<string>();
}
foreach (var cacheKey in customerCacheKeys)
{
cache.Remove(cacheKey);
}
cacheKeys.TryRemove(customerId, out _);
}
We have recently been getting performance problems that our web requests response time slow significantly over time - we don't see significant change in terms of the number of requests per second.
Looking at the garbage collection metrics we seem to notice a large Gen 2 heap size and a large object size which seem to going upwards - we don't see memory been reclaimed.
We are still in the middle of debugging this but I'm wondering could using the approach described above lead to the problems we are seeing. We want thread safety but could there be an issue using the concurrent dictionary we have above that even after we remove items that memory is not being freed leading to excessive Gen 2 collection.
Also we are using workstation garbage collection mode, imagine switching to server mode GC will help us (our IIS server has 8 processors + 16 GBs ram) but not sure switching will fix all the problems.
You may want to take advantage of the ExpirationTokens property of the MemoryCacheEntryOptions class. You can also use it from the ICacheEntry argument passed in the delegate of the LazyCache.Providers.MemoryCacheProvider.GetOrCreateAsync method. For example:
Task<T> GetOrAddAsync<T>(string key, Func<Task<T>> factory,
int durationMilliseconds = Timeout.Infinite, string customerId = null)
{
return GetMemoryCacheProvider().GetOrCreateAsync<T>(key, (options) =>
{
if (durationMilliseconds != Timeout.Infinite)
{
options.SetSlidingExpiration(TimeSpan.FromMilliseconds(durationMilliseconds));
}
if (customerId != null)
{
options.ExpirationTokens.Add(GetCustomerExpirationToken(customerId));
}
return factory();
});
}
Now the GetCustomerExpirationToken should return an object implementing the IChangeToken interface. Things are becoming a bit complex, but bear with me for a minute. The .NET platform doesn't provide a built-in IChangeToken implementation suitable for this case, since it is mainly focused on file system watchers. Implementing one is not difficult though:
class ChangeToken : IChangeToken, IDisposable
{
private volatile bool _hasChanged;
private readonly ConcurrentQueue<(Action<object>, object)>
registeredCallbacks = new ConcurrentQueue<(Action<object>, object)>();
public void SignalChanged()
{
_hasChanged = true;
while (registeredCallbacks.TryDequeue(out var entry))
{
var (callback, state) = entry;
callback?.Invoke(state);
}
}
bool IChangeToken.HasChanged => _hasChanged;
bool IChangeToken.ActiveChangeCallbacks => true;
IDisposable IChangeToken.RegisterChangeCallback(Action<object> callback,
object state)
{
registeredCallbacks.Enqueue((callback, state));
return this; // return null doesn't work
}
void IDisposable.Dispose() { } // It is called by the framework after each callback
}
This is a general implementation of the IChangeToken interface, that is activated manually with the SignalChanged method. The signal will be propagated to the underlying MemoryCache object, which will subsequently invalidate all entries associated with this token.
Now what is left to do is to associate these tokens with a customer, and store them somewhere. I think that a ConcurrentDictionary should be quite adequate:
private static readonly ConcurrentDictionary<string, ChangeToken>
CustomerChangeTokens = new ConcurrentDictionary<string, ChangeToken>();
private static ChangeToken GetCustomerExpirationToken(string customerId)
{
return CustomerChangeTokens.GetOrAdd(customerId, _ => new ChangeToken());
}
Finally the method that is needed to signal that all entries of a specific customer should be invalidated:
public static void SignalCustomerChanged(string customerId)
{
if (CustomerChangeTokens.TryRemove(customerId, out var changeToken))
{
changeToken.SignalChanged();
}
}
Large objects (> 85k) belong in gen 2 Large Object Heap (LOH), and they are pinned in memory.
GC scans LOH and marks dead objects
Adjacent dead objects are combined into free memory
The LOH is not compacted
Further allocations only try to fill in the holes left by dead objects.
No compaction, but only reallocation may lead to memory fragmentation.
Long running server processes can be done in by this - it is not uncommon.
You are probably seeing fragmentation occur over time.
Server GC just happens to be multi-threaded - I wouldn't expect it to solve fragmentation.
You could try breaking up your large objects - this might not be feasible for your application.
You can try setting LargeObjectHeapCompaction after a cache clear - assuming it's infrequent.
GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect();
Ultimately, I'd suggest profiling the heap to find out what works.
I have a slow and expensive method that return some data for me:
public Data GetData(){...}
I don't want to wait until this method will execute. Rather than I want to return a cached data immediately.
I have a class CachedData that contains one property Data cachedData.
So I want to create another method public CachedData GetCachedData() that will initiate a new task(call GetData inside of it) and immediately return cached data and after task will finish we will update the cache.
I need to have thread safe GetCachedData() because I will have multiple request that will call this method.
I will have a light ping "is there anything change?" each minute and if it will return true (cachedData != currentData) then I will call GetCachedData().
I'm new in C#. Please, help me to implement this method.
I'm using .net framework 4.5.2
The basic idea is clear:
You have a Data property which is wrapper around an expensive function call.
In order to have some response immediately the property holds a cached value and performs updating in the background.
No need for an event when the updater is done because you poll, for now.
That seems like a straight-forward design. At some point you may want to use events, but that can be added later.
Depending on the circumstances it may be necessary to make access to the property thread-safe. I think that if the Data cache is a simple reference and no other data is updated together with it, a lock is not necessary, but you may want to declare the reference volatile so that the reading thread does not rely on a stale cached (ha!) version. This post seems to have good links which discuss the issues.
If you will not call GetCachedData at the same time, you may not use lock. If data is null (for sure first run) we will wait long method to finish its work.
public class SlowClass
{
private static object _lock;
private static Data _cachedData;
public SlowClass()
{
_lock = new object();
}
public void GetCachedData()
{
var task = new Task(DoStuffLongRun);
task.Start();
if (_cachedData == null)
task.Wait();
}
public Data GetData()
{
if (_cachedData == null)
GetCachedData();
return _cachedData;
}
private void DoStuffLongRun()
{
lock (_lock)
{
Console.WriteLine("Locked Entered");
Thread.Sleep(5000);//Do Long Stuff
_cachedData = new Data();
}
}
}
I have tested on console application.
static void Main(string[] args)
{
var mySlow = new SlowClass();
var mySlow2 = new SlowClass();
mySlow.GetCachedData();
for (int i = 0; i < 5; i++)
{
Console.WriteLine(i);
mySlow.GetData();
mySlow2.GetData();
}
mySlow.GetCachedData();
Console.Read();
}
Maybe you can use the MemoryCache class,
as explained here in MSDN
does MemoryCache has functionality to cache fixed number of items?
e.g. We are only interested in cache 2000 items from database. While keep adding items to the cache, if the specified number of items are exceeded, the oldest one can be removed.
If not, do we have to use another thread to do the house keeping regularly?
It doesn't have anything built in that will limit the number of objects. Instead, it checks how much memory is being used, and compares it to the CacheMemoryLimit. If the CacheMemoryLimit is exceeded, it will drop older items. You can also set items to automatically expire after a certain amount of time via the CacheItemPolicy.
These approaches both make more sense if you're really using it as a Memory Cache. In other words, if you're worried about the tradeoff between a memory limit and the cost of fetching data, these are great ways to determine when to evict items from the cache. So ask yourself:
Am I really trying to use this as a MemoryCache? Why do I even care if only 2000 items are loaded from the database?
If you are worried about the memory overhead, or if you are worried about the items getting out of date, there are other (better) ways to manage the cache than specifying a number of objects. If you've got some custom reason to keep a specific number of objects in a data structure, consider using a different class.
Another option would be to create a new MemoryCache provider which performs the object limit management for you. This would override some of the MemoryCache methods in such as add and remove, and automatically roll-off items once the arbitrary limit (e.g. 2000 objects) has been reached.
One such implementation may look like the following:
public class ObjectLimitMemoryCache : MemoryCache
{
private const int ObjectLimit = 2000;
private const string IndexKey = "ObjectLimitIndexKey";
public ObjectLimitMemoryCache(string name, NameValueCollection config)
: base (name, config)
{
}
new public static ObjectLimitMemoryCache Default { get { return new ObjectLimitMemoryCache(Guid.NewGuid().ToString(), new NameValueCollection());}}
public override bool Add(string key, Object value, DateTimeOffset absoluteExpiration, string region = null)
{
try
{
var indexedKeys = (List<string>)(base.Get(IndexKey) ?? new List<string>());
if (base.Add(key, value, absoluteExpiration))
{
string existingKey;
if (string.IsNullOrEmpty(existingKey = indexedKeys.FirstOrDefault(x=>x == key)))
{
indexedKeys.Add(key);
}
if (base.GetCount() > ObjectLimit)
{
base.Remove(indexedKeys.First());
indexedKeys.RemoveAt(0);
}
base.Add(IndexKey, indexedKeys, new DateTimeOffset(DateTime.Now.AddHours(2)));
return true;
}
return false;
}
catch
{
//Log something and other fancy stuff
throw;
}
}
}
This is untested code and meant solely to illustrate an example implementation of MemoryCache. Good luck!
I have a question about improving the efficiency of my program. I have a Dictionary<string, Thingey> defined to hold named Thingeys. This is a web application that will create multiple named Thingey’s over time. Thingey’s are somewhat expensive to create (not prohibitively so) but I’d like to avoid it whenever possible. My logic for getting the right Thingey for the request looks a lot like this:
private Dictionary<string, Thingey> Thingeys;
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
if (!this.Thingeys.ContainsKey(thingeyName))
{
// create a new thingey on 1st reference
Thingey newThingey = new Thingey(request);
lock (this.Thingeys)
{
if (!this.Thingeys.ContainsKey(thingeyName))
{
this.Thingeys.Add(thingeyName, newThingey);
}
// else - oops someone else beat us to it
// newThingey will eventually get GCed
}
}
return this. Thingeys[thingeyName];
}
In this application, Thingeys live forever once created. We don’t know how to create them or which ones will be needed until the app starts and requests begin coming in. The question I have is in the above code is there are occasional instances where newThingey is created because we get multiple simultaneous requests for it before it’s been created. We end up creating 2 of them but only adding one to our collection.
Is there a better way to get Thingeys created and added that doesn’t involve check/create/lock/check/add with the rare extraneous thingey that we created but end up never using? (And this code works and has been running for some time. This is just the nagging bit that has always bothered me.)
I'm trying to avoid locking the dictionary for the duration of creating a Thingey.
This is the standard double check locking problem. The way it is implemented here is unsafe and can cause various problems - potentially up to the point of a crash in the first check if the internal state of the dictionary is screwed up bad enough.
It is unsafe because you are checking it without synchronization and if your luck is bad enough you can hit it while some other thread is in the middle of updating internal state of the dictionary
A simple solution is to place the first check under a lock as well. A problem with this is that this becomes a global lock and in web environment under heavy load it can become a serious bottleneck.
If we are talking about .NET environment, there are ways to work around this issue by piggybacking on the ASP.NET synchronization mechanism.
Here is how I did it in NDjango rendering engine: I keep one global dictionary and one dictionary per rendering thread. When a request comes I check the local dictionary first - this check does not have to be synchronized and if the thingy is there I just take it
If it is not I synchronize on the global dictionary check if it is there and if it is add it to my thread dictionary and release the lock. If it is not in the global dictionary I add it there first while still under lock.
Well, from my point of view simpler code is better, so I'd only use one lock:
private readonly object thingeysLock = new object();
private readonly Dictionary<string, Thingey> thingeys;
public Thingey GetThingey(Request request)
{
string key = request.ThingeyName;
lock (thingeysLock)
{
Thingey ret;
if (!thingeys.TryGetValue(key, out ret))
{
ret = new Thingey(request);
thingeys[key] = ret;
}
return ret;
}
}
Locks are really cheap when they're not contended. The downside is that this means that occasionally you will block everyone for the whole duration of the time you're creating a new Thingey. Clearly to avoid creating redundant thingeys you'd have to at least block while multiple threads create the Thingey for the same key. Reducing it so that they only block in that situation is somewhat harder.
I would suggest you use the above code but profile it to see whether it's fast enough. If you really need "only block when another thread is already creating the same thingey" then let us know and we'll see what we can do...
EDIT: You've commented on Adam's answer that you "don't want to lock while a new Thingey is being created" - you do realise that there's no getting away from that if there's contention for the same key, right? If thread 1 starts creating a Thingey, then thread 2 asks for the same key, your alternatives for thread 2 are either waiting or creating another instance.
EDIT: Okay, this is generally interesting, so here's a first pass at the "only block other threads asking for the same item".
private readonly object dictionaryLock = new object();
private readonly object creationLocksLock = new object();
private readonly Dictionary<string, Thingey> thingeys;
private readonly Dictionary<string, object> creationLocks;
public Thingey GetThingey(Request request)
{
string key = request.ThingeyName;
Thingey ret;
bool entryExists;
lock (dictionaryLock)
{
entryExists = thingeys.TryGetValue(key, out ret);
// Atomically mark the dictionary to say we're creating this item,
// and also set an entry for others to lock on
if (!entryExists)
{
thingeys[key] = null;
lock (creationLocksLock)
{
creationLocks[key] = new object();
}
}
}
// If we found something, great!
if (ret != null)
{
return ret;
}
// Otherwise, see if we're going to create it or whether we need to wait.
if (entryExists)
{
object creationLock;
lock (creationLocksLock)
{
creationLocks.TryGetValue(key, out creationLock);
}
// If creationLock is null, it means the creating thread has finished
// creating it and removed the creation lock, so we don't need to wait.
if (creationLock != null)
{
lock (creationLock)
{
Monitor.Wait(creationLock);
}
}
// We *know* it's in the dictionary now - so just return it.
lock (dictionaryLock)
{
return thingeys[key];
}
}
else // We said we'd create it
{
Thingey thingey = new Thingey(request);
// Put it in the dictionary
lock (dictionaryLock)
{
thingeys[key] = thingey;
}
// Tell anyone waiting that they can look now
lock (creationLocksLock)
{
Monitor.PulseAll(creationLocks[key]);
creationLocks.Remove(key);
}
return thingey;
}
}
Phew!
That's completely untested, and in particular it isn't in any way, shape or form robust in the face of exceptions in the creating thread... but I think it's the generally right idea :)
If you're looking to avoid blocking unrelated threads, then additional work is needed (and should only be necessary if you've profiled and found that performance is unacceptable with the simpler code). I would recommend using a lightweight wrapper class that asynchronously creates a Thingey and using that in your dictionary.
Dictionary<string, ThingeyWrapper> thingeys = new Dictionary<string, ThingeyWrapper>();
private class ThingeyWrapper
{
public Thingey Thing { get; private set; }
private object creationLock;
private Request request;
public ThingeyWrapper(Request request)
{
creationFlag = new object();
this.request = request;
}
public void WaitForCreation()
{
object flag = creationFlag;
if(flag != null)
{
lock(flag)
{
if(request != null) Thing = new Thingey(request);
creationFlag = null;
request = null;
}
}
}
}
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
ThingeyWrapper output;
lock (this.Thingeys)
{
if(!this.Thingeys.TryGetValue(thingeyName, out output))
{
output = new ThingeyWrapper(request);
this.Thingeys.Add(thingeyName, output);
}
}
output.WaitForCreation();
return output.Thing;
}
While you are still locking on all calls, the creation process is much more lightweight.
Edit
This issue has stuck with me more than I expected it to, so I whipped together a somewhat more robust solution that follows this general pattern. You can find it here.
IMHO, if this piece of code is called from many thread simultaneous, it is recommended to check it twice.
(But: I'm not sure that you can safely call ContainsKey while some other thread is call Add. So it might not be possible to avoid the lock at all.)
If you just want to avoid the Thingy is created but not used, just create it within the locking block:
private Dictionary<string, Thingey> Thingeys;
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
if (!this.Thingeys.ContainsKey(thingeyName))
{
lock (this.Thingeys)
{
// only one can create the same Thingy
Thingey newThingey = new Thingey(request);
if (!this.Thingeys.ContainsKey(thingeyName))
{
this.Thingeys.Add(thingeyName, newThingey);
}
}
}
return this. Thingeys[thingeyName];
}
You have to ask yourself the question whether the specific ContainsKey operation and the getter are themselfes threadsafe (and will stay that way in newer versions), because those may and willbe invokes while another thread has the dictionary locked and is performing the Add.
Typically, .NET locks are fairly efficient if used correctly, and I believe that in this situation you're better of doing this:
bool exists;
lock (thingeys) {
exists = thingeys.TryGetValue(thingeyName, out thingey);
}
if (!exists) {
thingey = new Thingey();
}
lock (thingeys) {
if (!thingeys.ContainsKey(thingeyName)) {
thingeys.Add(thingeyName, thingey);
}
}
return thingey;
Well I hope not being to naive at giving this answer. but what I would do, as Thingyes are expensive to create, would be to add the key with a null value. That is something like this
private Dictionary<string, Thingey> Thingeys;
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
if (!this.Thingeys.ContainsKey(thingeyName))
{
lock (this.Thingeys)
{
this.Thingeys.Add(thingeyName, null);
if (!this.Thingeys.ContainsKey(thingeyName))
{
// create a new thingey on 1st reference
Thingey newThingey = new Thingey(request);
Thingeys[thingeyName] = newThingey;
}
// else - oops someone else beat us to it
// but it doesn't mather anymore since we only created one Thingey
}
}
return this.Thingeys[thingeyName];
}
I modified your code in a rush so no testing was done.
Anyway, I hope my idea is not so naive. :D
You might be able to buy a little bit of speed efficiency at the expense of memory. If you create an immutable array that lists all of the created Thingys and reference the array with a static variable, then you could check the existance of a Thingy outside of any lock, since immutable arrays are always thread safe. Then when adding a new Thingy, you can create a new array with the additional Thingy and replace it (in the static variable) in one (atomic) set operation. Some new Thingys may be missed, because of race conditions, but the program shouldn't fail. It just means that on rare occasions extra duplicate Thingys will be made.
This will not replace the need for duplicate checking when creating a new Thingy, and it will use a lot of memory resources, but it will not require that the lock be taken or held while creating a Thingy.
I'm thinking of something along these lines, sorta:
private Dictionary<string, Thingey> Thingeys;
// An immutable list of (most of) the thingeys that have been created.
private string[] existingThingeys;
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
// Reference the same list throughout the method, just in case another
// thread replaces the global reference between operations.
string[] localThingyList = existingThingeys;
// Check to see if we already made this Thingey. (This might miss some,
// but it doesn't matter.
// This operation on an immutable array is thread-safe.
if (localThingyList.Contains(thingeyName))
{
// But referencing the dictionary is not thread-safe.
lock (this.Thingeys)
{
if (this.Thingeys.ContainsKey(thingeyName))
return this.Thingeys[thingeyName];
}
}
Thingey newThingey = new Thingey(request);
Thiney ret;
// We haven't locked anything at this point, but we have created a new
// Thingey that we probably needed.
lock (this.Thingeys)
{
// If it turns out that the Thingey was already there, then
// return the old one.
if (!Thingeys.TryGetValue(thingeyName, out ret))
{
// Otherwise, add the new one.
Thingeys.Add(thingeyName, newThingey);
ret = newThingey;
}
}
// Update our existingThingeys array atomically.
string[] newThingyList = new string[localThingyList.Length + 1];
Array.Copy(localThingyList, newThingey, localThingyList.Length);
newThingey[localThingyList.Length] = thingeyName;
existingThingeys = newThingyList; // Voila!
return ret;
}