C# LazyCache concurrent dictionary garbage collection - c#

Been having some problems with a web based .Net(C#) application. I'm using the LazyCache library to cache frequent JSON responses (some in & around 80+KB) for users belonging to the same company across user sessions.
One of the things we need to do is to keep track of the cache keys for a particular company so when any user in the company makes mutating changes to items being cached we need to clear the cache for those items for that particular company's users to force the cache to be repopulated immediately upon the receiving the next request.
We choose LazyCache library as we wanted to do this in memory without needing to use an external cache source such as Redis etc as we don't have heavy usage.
One of the problems we have using this approach is we need to keep track of all the cache keys belonging to a particular customer anytime we cache. So when any mutating change is made by company user's to the relevant resource we need to expire all the cache keys belonging to that company.
To achieve this we have a global cache which all web controllers have access to.
private readonly IAppCache _cache = new CachingService();
protected IAppCache GetCache()
{
return _cache;
}
A simplified example (forgive any typos!) of our controllers which use this cache would be something like below
[HttpGet]
[Route("{customerId}/accounts/users")]
public async Task<Users> GetUsers([Required]string customerId)
{
var usersBusinessLogic = await _provider.GetUsersBusinessLogic(customerId)
var newCacheKey= "GetUsers." + customerId;
CacheUtil.StoreCacheKey(customerId,newCacheKey)
return await GetCache().GetOrAddAsync(newCacheKey, () => usersBusinessLogic.GetUsers(), DateTimeOffset.Now.AddMinutes(10));
}
We use a util class with static methods and a static concurrent dictionary to store the cache keys - each company (GUID) can have many cache keys.
private static readonly ConcurrentDictionary<Guid, ConcurrentHashSet<string>> cacheKeys = new ConcurrentDictionary<Guid, ConcurrentHashSet<string>>();
public static void StoreCacheKey(Guid customerId, string newCacheKey)
{
cacheKeys.AddOrUpdate(customerId, new ConcurrentHashSet<string>() { newCacheKey }, (key, existingCacheKeys) =>
{
existingCacheKeys.Add(newCacheKey);
return existingCacheKeys;
});
}
Within that same util class when we need to remove all cache keys for a particular company we have a method similar to below (which is caused when mutating changes are made in other controllers)
public static void ClearCustomerCache(IAppCache cache, Guid customerId)
{
var customerCacheKeys = new ConcurrentHashSet<string>();
if (!cacheKeys.TryGetValue(customerId,out customerCacheKeys))
{
return new ConcurrentHashSet<string>();
}
foreach (var cacheKey in customerCacheKeys)
{
cache.Remove(cacheKey);
}
cacheKeys.TryRemove(customerId, out _);
}
We have recently been getting performance problems that our web requests response time slow significantly over time - we don't see significant change in terms of the number of requests per second.
Looking at the garbage collection metrics we seem to notice a large Gen 2 heap size and a large object size which seem to going upwards - we don't see memory been reclaimed.
We are still in the middle of debugging this but I'm wondering could using the approach described above lead to the problems we are seeing. We want thread safety but could there be an issue using the concurrent dictionary we have above that even after we remove items that memory is not being freed leading to excessive Gen 2 collection.
Also we are using workstation garbage collection mode, imagine switching to server mode GC will help us (our IIS server has 8 processors + 16 GBs ram) but not sure switching will fix all the problems.

You may want to take advantage of the ExpirationTokens property of the MemoryCacheEntryOptions class. You can also use it from the ICacheEntry argument passed in the delegate of the LazyCache.Providers.MemoryCacheProvider.GetOrCreateAsync method. For example:
Task<T> GetOrAddAsync<T>(string key, Func<Task<T>> factory,
int durationMilliseconds = Timeout.Infinite, string customerId = null)
{
return GetMemoryCacheProvider().GetOrCreateAsync<T>(key, (options) =>
{
if (durationMilliseconds != Timeout.Infinite)
{
options.SetSlidingExpiration(TimeSpan.FromMilliseconds(durationMilliseconds));
}
if (customerId != null)
{
options.ExpirationTokens.Add(GetCustomerExpirationToken(customerId));
}
return factory();
});
}
Now the GetCustomerExpirationToken should return an object implementing the IChangeToken interface. Things are becoming a bit complex, but bear with me for a minute. The .NET platform doesn't provide a built-in IChangeToken implementation suitable for this case, since it is mainly focused on file system watchers. Implementing one is not difficult though:
class ChangeToken : IChangeToken, IDisposable
{
private volatile bool _hasChanged;
private readonly ConcurrentQueue<(Action<object>, object)>
registeredCallbacks = new ConcurrentQueue<(Action<object>, object)>();
public void SignalChanged()
{
_hasChanged = true;
while (registeredCallbacks.TryDequeue(out var entry))
{
var (callback, state) = entry;
callback?.Invoke(state);
}
}
bool IChangeToken.HasChanged => _hasChanged;
bool IChangeToken.ActiveChangeCallbacks => true;
IDisposable IChangeToken.RegisterChangeCallback(Action<object> callback,
object state)
{
registeredCallbacks.Enqueue((callback, state));
return this; // return null doesn't work
}
void IDisposable.Dispose() { } // It is called by the framework after each callback
}
This is a general implementation of the IChangeToken interface, that is activated manually with the SignalChanged method. The signal will be propagated to the underlying MemoryCache object, which will subsequently invalidate all entries associated with this token.
Now what is left to do is to associate these tokens with a customer, and store them somewhere. I think that a ConcurrentDictionary should be quite adequate:
private static readonly ConcurrentDictionary<string, ChangeToken>
CustomerChangeTokens = new ConcurrentDictionary<string, ChangeToken>();
private static ChangeToken GetCustomerExpirationToken(string customerId)
{
return CustomerChangeTokens.GetOrAdd(customerId, _ => new ChangeToken());
}
Finally the method that is needed to signal that all entries of a specific customer should be invalidated:
public static void SignalCustomerChanged(string customerId)
{
if (CustomerChangeTokens.TryRemove(customerId, out var changeToken))
{
changeToken.SignalChanged();
}
}

Large objects (> 85k) belong in gen 2 Large Object Heap (LOH), and they are pinned in memory.
GC scans LOH and marks dead objects
Adjacent dead objects are combined into free memory
The LOH is not compacted
Further allocations only try to fill in the holes left by dead objects.
No compaction, but only reallocation may lead to memory fragmentation.
Long running server processes can be done in by this - it is not uncommon.
You are probably seeing fragmentation occur over time.
Server GC just happens to be multi-threaded - I wouldn't expect it to solve fragmentation.
You could try breaking up your large objects - this might not be feasible for your application.
You can try setting LargeObjectHeapCompaction after a cache clear - assuming it's infrequent.
GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect();
Ultimately, I'd suggest profiling the heap to find out what works.

Related

Parallel.ForEach: Best way to save off a collection when its record count gets high?

So I'm running a Parallel.ForEach that basically generates a bunch of data which is ultimately going to be saved to a database. However, since collection of data can get quite large I need to be able to occasionally save/clear the collection so as to not run into an OutOfMemoryException.
I'm new to using Parallel.ForEach, concurrent collections, and locks, so I'm a little fuzzy on what exactly needs to be done to make sure everything works correctly (i.e. we don't get any records added to the collection between the Save and Clear operations).
Currently I'm saying, if the record count is above a certain threshold, save the data in the current collection, within a lock block.
ConcurrentStack<OutRecord> OutRecs = new ConcurrentStack<OutRecord>();
object StackLock = new object();
Parallel.ForEach(inputrecords, input =>
{
lock(StackLock)
{
if (OutRecs.Count >= 50000)
{
Save(OutRecs);
OutRecs.Clear();
}
}
OutRecs.Push(CreateOutputRecord(input);
});
if (OutRecs.Count > 0) Save(OutRecs);
I'm not 100% certain whether or not this works the way I think it does. Does the lock stop other instances of the loop from writing to output collection? If not is there a better way to do this?
Your lock will work correctly but it will not be very efficient because all your worker threads will be forced to pause for the entire duration of each save operation. Also, locks tends to be (relatively) expensive, so performing a lock in each iteration of each thread is a bit wasteful.
One of your comments mentioned giving each worker thread its own data storage: yes, you can do this. Here's an example that you could tailor to your needs:
Parallel.ForEach(
// collection of objects to iterate over
inputrecords,
// delegate to initialize thread-local data
() => new List<OutRecord>(),
// body of loop
(inputrecord, loopstate, localstorage) =>
{
localstorage.Add(CreateOutputRecord(inputrecord));
if (localstorage.Count > 1000)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
return localstorage;
},
// finally block gets executed after each thread exits
localstorage =>
{
if (localstorage.Count > 0)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
});
One approach is to define an abstraction that represents the destination for your data. It could be something like this:
public interface IRecordWriter<T> // perhaps come up with a better name.
{
void WriteRecord(T record);
void Flush();
}
Your class that processes the records in parallel doesn't need to worry about how those records are handled or what happens when there's too many of them. The implementation of IRecordWriter handles all those details, making your other class easier to test.
An implementation of IRecordWriter could look something like this:
public abstract class BufferedRecordWriter<T> : IRecordWriter<T>
{
private readonly ConcurrentQueue<T> _buffer = new ConcurrentQueue<T>();
private readonly int _maxCapacity;
private bool _flushing;
public ConcurrentQueueRecordOutput(int maxCapacity = 100)
{
_maxCapacity = maxCapacity;
}
public void WriteRecord(T record)
{
_buffer.Enqueue(record);
if (_buffer.Count >= _maxCapacity && !_flushing)
Flush();
}
public void Flush()
{
_flushing = true;
try
{
var recordsToWrite = new List<T>();
while (_buffer.TryDequeue(out T dequeued))
{
recordsToWrite.Add(dequeued);
}
if(recordsToWrite.Any())
WriteRecords(recordsToWrite);
}
finally
{
_flushing = false;
}
}
protected abstract void WriteRecords(IEnumerable<T> records);
}
When the buffer reaches the maximum size, all the records in it are sent to WriteRecords. Because _buffer is a ConcurrentQueue it can keep reading records even as they are added.
That Flush method could be anything specific to how you write your records. Instead of this being an abstract class the actual output to a database or file could be yet another dependency that gets injected into this one. You can make decisions like that, refactor, and change your mind because the very first class isn't affected by those changes. All it knows about is the IRecordWriter interface which doesn't change.
You might notice that I haven't made absolutely certain that Flush won't execute concurrently on different threads. I could put more locking around this, but it really doesn't matter. This will avoid most concurrent executions, but it's okay if concurrent executions both read from the ConcurrentQueue.
This is just a rough outline, but it shows how all of the steps become simpler and easier to test if we separate them. One class converts inputs to outputs. Another class buffers the outputs and writes them. That second class can even be split into two - one as a buffer, and another as the "final" writer that sends them to a database or file or some other destination.

How to use Lazy to handle concurrent request?

I'm new in C# and trying to understand how to work with Lazy.
I need to handle concurrent request by waiting the result of an already running operation. Requests for data may come in simultaneously with same/different credentials.
For each unique set of credentials there can be at most one GetDataInternal call in progress, with the result from that one call returned to all queued waiters when it is ready
private readonly ConcurrentDictionary<Credential, Lazy<Data>> Cache
= new ConcurrentDictionary<Credential, Lazy<Data>>();
public Data GetData(Credential credential)
{
// This instance will be thrown away if a cached
// value with our "credential" key already exists.
Lazy<Data> newLazy = new Lazy<Data>(
() => GetDataInternal(credential),
LazyThreadSafetyMode.ExecutionAndPublication
);
Lazy<Data> lazy = Cache.GetOrAdd(credential, newLazy);
bool added = ReferenceEquals(newLazy, lazy); // If true, we won the race.
Data data;
try
{
// Wait for the GetDataInternal call to complete.
data = lazy.Value;
}
finally
{
// Only the thread which created the cache value
// is allowed to remove it, to prevent races.
if (added) {
Cache.TryRemove(credential, out lazy);
}
}
return data;
}
Is that right way to use Lazy or my code is not safe?
Update:
Is it good idea to start using MemoryCache instead of ConcurrentDictionary? If yes, how to create a key value, because it's a string inside MemoryCache.Default.AddOrGetExisting()
This is correct. This is a standard pattern (except for the removal) and it's a really good cache because it prevents cache stampeding.
I'm not sure you want to remove from the cache when the computation is done because the computation will be redone over and over that way. If you don't need the removal you can simplify the code by basically deleting the second half.
Note, that Lazy has a problem in the case of an exception: The exception is stored and the factory will never be re-executed. The problem persists forever (until a human restarts the app). In my mind this makes Lazy completely unsuitable for production use in most cases.
This means that a transient error such as a network issue can render the app unavailable permanently.
This answer is directed to the updated part of the original question. See #usr answer regarding thread-safety with Lazy<T> and the potential pitfalls.
I would like to know how to avoid using ConcurrentDictionary<TKey, TValue> and start
using MemoryCache? How to implement
MemoryCache.Default.AddOrGetExisting()?
If you're looking for a cache which has a mechanism for auto expiry, then MemoryCache is a good choice if you don't want to implement the mechanics yourself.
In order to utilize MemoryCache which forces a string representation for a key, you'll need to create a unique string representation of a credential, perhaps a given user id or a unique username?
If you can, you can create an override of ToString which represents your unique identifier or simply use the said property, and utilize MemoryCache like this:
public class Credential
{
public Credential(int userId)
{
UserId = userId;
}
public int UserId { get; private set; }
}
And now your method will look like this:
private const EvictionIntervalMinutes = 10;
public Data GetData(Credential credential)
{
Lazy<Data> newLazy = new Lazy<Data>(
() => GetDataInternal(credential), LazyThreadSafetyMode.ExecutionAndPublication);
CacheItemPolicy evictionPolicy = new CacheItemPolicy
{
AbsoluteExpiration = DateTimeOffset.UtcNow.AddMinutes(EvictionIntervalMinutes)
};
var result = MemoryCache.Default.AddOrGetExisting(
new CacheItem(credential.UserId.ToString(), newLazy), evictionPolicy);
return result != null ? ((Lazy<Data>)result.Value).Value : newLazy.Value;
}
MemoryCache provides you with a thread-safe implementation, this means that two threads accessing AddOrGetExisting will only cause a single cache item to be added or retrieved. Further, Lazy<T> with ExecutionAndPublication guarantess only a single unique invocation of the factory method.

How to free memory on Dictionary in static class?

I have a problem with freeing memory in C#. I have a static class containing a static dictionary, which is filled with references to objects. Single object zajumie large amount of memory. From time to time I release the memory by deleting obsolete references to the object set to null and remove the item from the dictionary. Unfortunately, in this case, the memory is not slowing down, time after reaching the maximum size of the memory in the system is as if a sudden release of unused resources and the amount of memory used correctly decreases.
Below is the diagram of classes:
public class cObj
{
public DateTime CreatedOn;
public object ObjectData;
}
public static class cData
{
public static ConcurrentDictionary<Guid, cObj> ObjectDict = new ConcurrentDictionary<Guid, cObj>();
public static FreeData()
{
foreach(var o in ObjectDict)
{
if (o.Value.CreatedOn <= DateTime.Now.AddSeconds(-30))
{
cObj Data;
if (ObjectDict.TryGetValue(o.Key, out Data))
{
Data.Status = null;
Data.ObjectData = null;
ObjectDict.TryRemove(o.Key, out Data);
}
}
}
}
}
In this case, the memory is released. If, however, after this operation, I call
GC.Collect ();
Followed by the expected release of unused objects.
How to solve the problem, so you do not have to use the GC.Collect()?
You shouldn't have to call GC.Collect() in most cases. To GC.Collect or not?
I've had similar scenarios where I've just created a dictionary that's limited to n entries, I did this myself on top of ConcurrentDictionary but you could use BlockingCollection.
One possible advantage is that if 1 million entries get added at the same time, all except n will be available for garbage collection rather than 30 seconds later.

MemoryCache with certain number of items

does MemoryCache has functionality to cache fixed number of items?
e.g. We are only interested in cache 2000 items from database. While keep adding items to the cache, if the specified number of items are exceeded, the oldest one can be removed.
If not, do we have to use another thread to do the house keeping regularly?
It doesn't have anything built in that will limit the number of objects. Instead, it checks how much memory is being used, and compares it to the CacheMemoryLimit. If the CacheMemoryLimit is exceeded, it will drop older items. You can also set items to automatically expire after a certain amount of time via the CacheItemPolicy.
These approaches both make more sense if you're really using it as a Memory Cache. In other words, if you're worried about the tradeoff between a memory limit and the cost of fetching data, these are great ways to determine when to evict items from the cache. So ask yourself:
Am I really trying to use this as a MemoryCache? Why do I even care if only 2000 items are loaded from the database?
If you are worried about the memory overhead, or if you are worried about the items getting out of date, there are other (better) ways to manage the cache than specifying a number of objects. If you've got some custom reason to keep a specific number of objects in a data structure, consider using a different class.
Another option would be to create a new MemoryCache provider which performs the object limit management for you. This would override some of the MemoryCache methods in such as add and remove, and automatically roll-off items once the arbitrary limit (e.g. 2000 objects) has been reached.
One such implementation may look like the following:
public class ObjectLimitMemoryCache : MemoryCache
{
private const int ObjectLimit = 2000;
private const string IndexKey = "ObjectLimitIndexKey";
public ObjectLimitMemoryCache(string name, NameValueCollection config)
: base (name, config)
{
}
new public static ObjectLimitMemoryCache Default { get { return new ObjectLimitMemoryCache(Guid.NewGuid().ToString(), new NameValueCollection());}}
public override bool Add(string key, Object value, DateTimeOffset absoluteExpiration, string region = null)
{
try
{
var indexedKeys = (List<string>)(base.Get(IndexKey) ?? new List<string>());
if (base.Add(key, value, absoluteExpiration))
{
string existingKey;
if (string.IsNullOrEmpty(existingKey = indexedKeys.FirstOrDefault(x=>x == key)))
{
indexedKeys.Add(key);
}
if (base.GetCount() > ObjectLimit)
{
base.Remove(indexedKeys.First());
indexedKeys.RemoveAt(0);
}
base.Add(IndexKey, indexedKeys, new DateTimeOffset(DateTime.Now.AddHours(2)));
return true;
}
return false;
}
catch
{
//Log something and other fancy stuff
throw;
}
}
}
This is untested code and meant solely to illustrate an example implementation of MemoryCache. Good luck!

How to get notified if a object is garbage collected?

Short version:
For a cache class I need to get notified if an object is garbage collected (to remove the according entries from my cache). What is the best way to do so? Sending an event from the destructor?
Long version:
I am writing a cacher/memoizer for functions that take one huge parameter-tree object and many small value type parameters, e.g.,
double myFunc(HugeParTree parTree, int dynPar1, double dynPar2)
I want to cache these functions in the following way:
Cache results for the tuples (parTree.GUID, dynPar1, dynPar2, ...)
Whenever parTree changes, which seldomly happens, all according cache entries are deleted (via Observer pattern). (parTree.Equals() is just too expensive; it compares 100+ value types).
Code looks like this right now (for one value parameter):
public class CachedFunction1ObsPar1Par<TRet, TObsPar1, TPar1>
where TObsPar1 : IObservable, IProvideGUID
{
public delegate TRet ValueCalculator(TObsPar1 obsPar1, TPar1 par1);
public CachedFunction1ObsPar1Par(ValueCalculator calc)
{
_calc = calc;
}
#region members
private ValueCalculator _calc;
private Dictionary<Guid, Dictionary<TPar1, TRet>> _cache =
new Dictionary<Guid, Dictionary<TPar1,TRet>>();
#endregion
public TRet value(TObsPar1 obsPar1, TPar1 par1)
{
TRet result;
bool cacheHit = checkCache(obsPar1, par1, out result);
if (cacheHit)
{
Debug.Assert(result.Equals(_calc(obsPar1, par1)));
return result;
}
else
{
result = _calc(obsPar1, par1);
_cache[obsPar1.GUID].Add(par1, result);
return result;
}
}
private bool checkCache(TObsPar1 obsPar1, TPar1 par1, out TRet result)
{
if (!_cache.ContainsKey(obsPar1.GUID))
{
_cache.Add(obsPar1.GUID, new Dictionary<TPar1, TRet>());
obsPar1._changed += this.invalidateCache;
}
Dictionary<TPar1, TRet> guidCache = _cache[obsPar1.GUID];
bool success = guidCache.TryGetValue(par1, out result);
return success;
}
private void invalidateCache(object sender)
{
TObsPar1 obsPar = (TObsPar1)sender;
_cache.Remove(obsPar.GUID);
obsPar._changed -= this.invalidateCache;
}
}
I haven't tested this yet, as I still have the problem that cache entries never get removed after the according parTree is not used any more. I'd love a synchronous solution without repeated "scans" for very old cache entries.
For a cache class I need to get notified if an object is garbage
collected (to remove the according entries from my cache). What is the
best way to do so? Sending an event from the destructor?
If your cache holds normal (strong) references the items will never be collected.
If your cache holds WeakReferences you do not have to remove anything.
You could define an interface 'ICacheable' that must be implemented by the objects in the cache. In a method of the interface RemoveFromCache() you could search the cache for its child objects and remove them.
When you remove an item from the cache, test it for the interface and call RemoveFromCache().
This is similar to IDisposable.
Garbage collection is not something to count on because you never know when it will run.
Henk already mentioned the flaw in your requirement.
But, just to answer your question.
To know when an object is being garbage collected you can write a destructor for that object.
~YourClass();
As per MSDN:
This method is automatically called after an object becomes
inaccessible
Though it's never recommended to rely on GC or destructor.

Categories

Resources