Is it there any LRU implementation of IDictionary? - c#

I would like to implement a simple in-memory LRU cache system and I was thinking about a solution based on an IDictionary implementation which could handle an hashed LRU mechanism.
Coming from java, I have experiences with LinkedHashMap, which works fine for what I need: I can't find anywhere a similar solution for .NET.
Has anyone developed it or has anyone had experiences like this?

This a very simple and fast implementation we developed for a web site we own.
We tried to improve the code as much as possible, while keeping it thread safe.
I think the code is very simple and clear, but if you need some explanation or a guide related to how to use it, don't hesitate to ask.
namespace LRUCache
{
public class LRUCache<K,V>
{
private int capacity;
private Dictionary<K, LinkedListNode<LRUCacheItem<K, V>>> cacheMap = new Dictionary<K, LinkedListNode<LRUCacheItem<K, V>>>();
private LinkedList<LRUCacheItem<K, V>> lruList = new LinkedList<LRUCacheItem<K, V>>();
public LRUCache(int capacity)
{
this.capacity = capacity;
}
[MethodImpl(MethodImplOptions.Synchronized)]
public V get(K key)
{
LinkedListNode<LRUCacheItem<K, V>> node;
if (cacheMap.TryGetValue(key, out node))
{
V value = node.Value.value;
lruList.Remove(node);
lruList.AddLast(node);
return value;
}
return default(V);
}
[MethodImpl(MethodImplOptions.Synchronized)]
public void add(K key, V val)
{
if (cacheMap.TryGetValue(key, out var existingNode))
{
lruList.Remove(existingNode);
}
else if (cacheMap.Count >= capacity)
{
RemoveFirst();
}
LRUCacheItem<K, V> cacheItem = new LRUCacheItem<K, V>(key, val);
LinkedListNode<LRUCacheItem<K, V>> node = new LinkedListNode<LRUCacheItem<K, V>>(cacheItem);
lruList.AddLast(node);
// cacheMap.Add(key, node); - here's bug if try to add already existing value
cacheMap[key] = node;
}
private void RemoveFirst()
{
// Remove from LRUPriority
LinkedListNode<LRUCacheItem<K,V>> node = lruList.First;
lruList.RemoveFirst();
// Remove from cache
cacheMap.Remove(node.Value.key);
}
}
class LRUCacheItem<K,V>
{
public LRUCacheItem(K k, V v)
{
key = k;
value = v;
}
public K key;
public V value;
}
}

There is nothing in the base class libraries that does this.
On the free side, maybe something like C5's HashedLinkedList would work.
If you're willing to pay, maybe check out this C# toolkit. It contains an implementation.

I've recently released a class called LurchTable to address the need for a C# variant of the LinkedHashMap. A brief discussion of the LurchTable can be found here.
Basic features:
Linked Concurrent Dictionary by Insertion, Modification, or Access
Dictionary/ConcurrentDictionary interface support
Peek/TryDequeue/Dequeue access to 'oldest' entry
Allows hard-limit on items enforced at insertion
Exposes events for add, update, and remove
Source Code: http://csharptest.net/browse/src/Library/Collections/LurchTable.cs
GitHub: https://github.com/csharptest/CSharpTest.Net.Collections
HTML Help: http://help.csharptest.net/
PM> Install-Package CSharpTest.Net.Collections

The LRUCache answer with sample code above uses MethodImplOptions.Synchronized, which is equivalent to putting lock(this) around each method call. Whilst correct, this global lock will significantly reduce throughput under concurrent load.
To solve this I implemented a thread safe pseudo LRU designed for concurrent workloads. Performance is very close to ConcurrentDictionary, ~10x faster than MemoryCache and hit rate is better than a conventional LRU. Full analysis provided in the github link below.
Usage looks like this:
int capacity = 666;
var lru = new ConcurrentLru<int, SomeItem>(capacity);
var value = lru.GetOrAdd(1, (k) => new SomeItem(k));
GitHub: https://github.com/bitfaster/BitFaster.Caching
Install-Package BitFaster.Caching

Found you answer while googling, also found this:
http://code.google.com/p/csharp-lru-cache/
csharp-lru-cache: LRU cache collection class library
This is a collection class that
functions as a least-recently-used
cache. It implements ICollection<T>,
but also exposes three other members:
Capacity, the maximum number of items
the cache can contain. Once the
collection is at capacity, adding a
new item to the cache will cause the
least recently used item to be
discarded. If the Capacity is set to 0
at construction, the cache will not
automatically discard items.
Oldest,
the oldest (i.e. least recently used)
item in the collection.
DiscardingOldestItem, an event raised
when the cache is about to discard its
oldest item. This is an extremely
simple implementation. While its Add
and Remove methods are thread-safe, it
shouldn't be used in heavy
multithreading environments because
the entire collection is locked during
those methods.

This takes Martin's code with Mr T's suggestions and makes it Stylecop friendly. Oh, it also allows for disposal of values as they cycle out of the cache.
namespace LruCache
{
using System;
using System.Collections.Generic;
/// <summary>
/// A least-recently-used cache stored like a dictionary.
/// </summary>
/// <typeparam name="TKey">
/// The type of the key to the cached item
/// </typeparam>
/// <typeparam name="TValue">
/// The type of the cached item.
/// </typeparam>
/// <remarks>
/// Derived from https://stackoverflow.com/a/3719378/240845
/// </remarks>
public class LruCache<TKey, TValue>
{
private readonly Dictionary<TKey, LinkedListNode<LruCacheItem>> cacheMap =
new Dictionary<TKey, LinkedListNode<LruCacheItem>>();
private readonly LinkedList<LruCacheItem> lruList =
new LinkedList<LruCacheItem>();
private readonly Action<TValue> dispose;
/// <summary>
/// Initializes a new instance of the <see cref="LruCache{TKey, TValue}"/>
/// class.
/// </summary>
/// <param name="capacity">
/// Maximum number of elements to cache.
/// </param>
/// <param name="dispose">
/// When elements cycle out of the cache, disposes them. May be null.
/// </param>
public LruCache(int capacity, Action<TValue> dispose = null)
{
this.Capacity = capacity;
this.dispose = dispose;
}
/// <summary>
/// Gets the capacity of the cache.
/// </summary>
public int Capacity { get; }
/// <summary>Gets the value associated with the specified key.</summary>
/// <param name="key">
/// The key of the value to get.
/// </param>
/// <param name="value">
/// When this method returns, contains the value associated with the specified
/// key, if the key is found; otherwise, the default value for the type of the
/// <paramref name="value" /> parameter. This parameter is passed
/// uninitialized.
/// </param>
/// <returns>
/// true if the <see cref="T:System.Collections.Generic.Dictionary`2" />
/// contains an element with the specified key; otherwise, false.
/// </returns>
public bool TryGetValue(TKey key, out TValue value)
{
lock (this.cacheMap)
{
LinkedListNode<LruCacheItem> node;
if (this.cacheMap.TryGetValue(key, out node))
{
value = node.Value.Value;
this.lruList.Remove(node);
this.lruList.AddLast(node);
return true;
}
value = default(TValue);
return false;
}
}
/// <summary>
/// Looks for a value for the matching <paramref name="key"/>. If not found,
/// calls <paramref name="valueGenerator"/> to retrieve the value and add it to
/// the cache.
/// </summary>
/// <param name="key">
/// The key of the value to look up.
/// </param>
/// <param name="valueGenerator">
/// Generates a value if one isn't found.
/// </param>
/// <returns>
/// The requested value.
/// </returns>
public TValue Get(TKey key, Func<TValue> valueGenerator)
{
lock (this.cacheMap)
{
LinkedListNode<LruCacheItem> node;
TValue value;
if (this.cacheMap.TryGetValue(key, out node))
{
value = node.Value.Value;
this.lruList.Remove(node);
this.lruList.AddLast(node);
}
else
{
value = valueGenerator();
if (this.cacheMap.Count >= this.Capacity)
{
this.RemoveFirst();
}
LruCacheItem cacheItem = new LruCacheItem(key, value);
node = new LinkedListNode<LruCacheItem>(cacheItem);
this.lruList.AddLast(node);
this.cacheMap.Add(key, node);
}
return value;
}
}
/// <summary>
/// Adds the specified key and value to the dictionary.
/// </summary>
/// <param name="key">
/// The key of the element to add.
/// </param>
/// <param name="value">
/// The value of the element to add. The value can be null for reference types.
/// </param>
public void Add(TKey key, TValue value)
{
lock (this.cacheMap)
{
if (this.cacheMap.Count >= this.Capacity)
{
this.RemoveFirst();
}
LruCacheItem cacheItem = new LruCacheItem(key, value);
LinkedListNode<LruCacheItem> node =
new LinkedListNode<LruCacheItem>(cacheItem);
this.lruList.AddLast(node);
this.cacheMap.Add(key, node);
}
}
private void RemoveFirst()
{
// Remove from LRUPriority
LinkedListNode<LruCacheItem> node = this.lruList.First;
this.lruList.RemoveFirst();
// Remove from cache
this.cacheMap.Remove(node.Value.Key);
// dispose
this.dispose?.Invoke(node.Value.Value);
}
private class LruCacheItem
{
public LruCacheItem(TKey k, TValue v)
{
this.Key = k;
this.Value = v;
}
public TKey Key { get; }
public TValue Value { get; }
}
}
}

The Caching Application Block of EntLib has an LRU scavenging option out of the box and can be in memory. It might be a bit heavyweight for what you want tho.

I don't believe so. I've certainly seen hand-rolled ones implemented several times in various unrelated projects (which more or less confirms this. If there was one, surely at least one of the projects would have used it).
It's pretty simple to implement, and usually gets done by creating a class which contains both a Dictionary and a List.
The keys go in the list (in-order) and the items go in the dictionary.
When you Add a new item to the collection, the function checks the length of the list, pulls out the last Key (if it's too long) and then evicts the key and value from the dictionary to match. Not much more to it really

I like Lawrence's implementation. Hashtable + LinkedList is a good solution.
Regarding threading, I would not lock this with[MethodImpl(MethodImplOptions.Synchronized)], but rather use ReaderWriterLockSlim or spin lock (since contention usually fast) instead.
In the Get function I would check if it's already the 1st item first, rather than always removing and adding. This gives you the possibility to keep that within a reader lock that is not blocking other readers.

I just accidently found now LruCache.cs in aws-sdk-net: https://github.com/aws/aws-sdk-net/blob/master/sdk/src/Core/Amazon.Runtime/Internal/Util/LruCache.cs

If it's an asp.net app you can use the cache class[1] but you'll be competing for space with other cached stuff, which may be what you want or may not be.
[1] http://msdn.microsoft.com/en-us/library/system.web.caching.cache.aspx

Related

Expose a Collection But Exclude Parts

In a parent class, I have a collection. In a child class, I want to expose a part of the parent class collection. I want changes from either location to be affect the other.
My real life situation is I am creating a part of an application that will record a database design. I have a ConstraintList collection inside of a Database class. The ConstraintList contains a Constraint class for each constraint in the database. I also have a TablesList collection in the Database class, that contains Table classes. In the Table class I have a ForeignKeyConstraintList where I want to expose the constraints from the parent (Database class) ConstraintList that are foreign key constraints for this Table class.
+-Database Class
|
+--ConstraintList <-----------
| |
+--TableList Same List
| |
+-Table Class |
| |
+-ForeignKeyConstraintList
I have tried using an existing List class from the primary collection and using Linq to filter it to another List collection. However this doesn't work because this makes two List classes. If an entry is removed from the one List it still exists in the other List.
I thought about having the ForeignKeyConstraintList property of the Table class pull directly from the ConstraintList property of the Database class each time it is called but the act of filtering it causes it to create a new List class and thus any entries removed from ForeignKeyConstraintList would not be removed from the ConstraintList.
Another option I came up with so far is creating a new class that implements the same interfaces as List but doesn't subclass from it. Then using a private field to store a reference to the primary List class. Then writing custom Add and Remove methods that sync any changes back to the ConstraintList. I would also need to create a custom implementation of the IEnemerable and IEnumerable to skip items that don't meet the filter criteria.
In a parent class, I have a collection. In a child class, I want to expose a part of the parent class collection. I want changes from either location to be affect the other.
I decided to write a custom List type class to accomplish this. I will post the code below. I haven't tested yet but I figured this would be a good start for anyone else who wants to do the same thing.
hmmmm, seems the class is too large to fit in here. I will just post the key parts and skip the public methods, which just implement the various interfaces.
using System;
using System.Collections;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Linq;
namespace CodeWriter.Collections.Generic
{
/// <summary>
/// This represents a strongly typed list of objects that can be accessed by index. Provides methods to search, sort and manipulate the list.
/// This class serves as a wrapper for a <see cref="List{T}"/>. The internal class can be reached by the <see cref="SourceList"/> property.
/// The elements that this class exposes from the <see cref="SourceList"/> can be controlled by changing the <see cref="Filter"/> property.
/// </summary>
/// <typeparam name="T">The type of elements in the list.</typeparam>
/// <remarks>
/// This class was created to support situations where the functionality of two or more <see cref="List{T}"/> collections are needed where one is the Master Collection
/// and the others are Partial Collections. The Master Collection is a <see cref="List{T}"/> and exposes all elements in the collection. The Partial Collections
/// are <see cref="FilteredList{T}"/> classes (this class) and only expose the elements chosen by the <see cref="FilteredList{T}"/> property of this class. When elements are modified,
/// in either type of collection, the changes show up in the other collections because in the backend they are the same list. When elements are added or deleted from the Partial Collections,
/// they will disappear from the Master Collection. When elements are deleted from the Master Collection, they will not be available in the Partial Collection but it
/// may not be apparent because the <see cref="Filter"/> property may not be exposing them.
/// </remarks>
public class FilteredList<T> : IList<T>, ICollection<T>, IEnumerable<T>, IEnumerable, IList, ICollection, IReadOnlyList<T>, IReadOnlyCollection<T>
{
#region Public Constructor
public FilteredList(List<T> SourceList)
{
if (SourceList == null)
{
throw new ArgumentNullException("SourceList");
}
_SourceList = SourceList;
}
public FilteredList()
{
_SourceList = new List<T>();
}
public FilteredList(IEnumerable<T> Collection)
{
if (Collection == null)
{
throw new ArgumentNullException("Collection");
}
_SourceList = new List<T>(Collection);
}
#endregion
#region Protected Members
protected List<T> _SourceList;
protected Func<T, bool> _Filter;
#endregion
#region Public Properties
#region Source List Properties
/// <summary>
/// Gets or sets the base class that this class is a wrapper around.
/// </summary>
public List<T> SourceList
{
get
{
return _SourceList;
}
set
{
_SourceList = value;
}
}
/// <summary>
/// Gets or sets the value used to filter the <see cref="SourceList"/>.
/// </summary>
public Func<T, bool> Filter
{
get
{
return _Filter;
}
set
{
_Filter = value;
}
}
#endregion
#region Normal List<T> Implementation
/// <summary>
/// Provides access to the collection the in the same manner as an <see cref="Array"/>.
/// </summary>
/// <param name="Index">The Index of the element you want to retrieve. Valid values are from zero to the value in the <see cref="Count"/> property.</param>
/// <returns>The element at the position provided with the indexer.</returns>
public T this[int Index]
{
get
{
List<T> Selected = _SourceList.Where(_Filter).ToList();
return Selected[Index];
}
set
{
List<T> Selected = _SourceList.Where(_Filter).ToList();
Selected[Index] = value;
}
}
/// <summary>
/// Provides access to the collection the in the same manner as an <see cref="Array"/>.
/// </summary>
/// <param name="Index">The Index of the element you want to retrieve. Valid values are from zero to the value in the <see cref="Count"/> property.</param>
/// <returns>The element at the position provided with the indexer.</returns>
/// <remarks>This is required for IList implementation.</remarks>
object IList.this[int Index]
{
get
{
return this[Index];
}
set
{
if ((value is T) == false)
{
throw new ArgumentException("Value passed is not a valid type.");
}
this[Index] = (T)value;
}
}
/// <summary>
/// Gets or sets the total number of elements the internal data structure can hold without resizing.
/// </summary>
public int Capacity
{
get
{
return _SourceList.Capacity;
}
set
{
// We cannot let them shrink capacity because this class is a wrapper for the List<T> in the _SourceList property.
// They don't get to see all the entries in that list because it is filtered. Therefore it is not safe for them to shrink capacity.
// We check if they are shrinking the capacity.
if (value >= _SourceList.Capacity)
{
_SourceList.Capacity = value;
}
}
}
/// <summary>
/// Gets the number of elements contained in the <see cref="FilteredList{T}"/>.
/// </summary>
public int Count
{
get
{
List<T> Selected = _SourceList.Where(_Filter).ToList();
return Selected.Count();
}
}
/// <summary>
/// Gets a value indicating whether the <see cref="FilteredList{T}"/> has a fixed size.
/// </summary>
public bool IsFixedSize
{
get
{
return false;
}
}
/// <summary>
/// Gets a value indicating whether the <see cref="FilteredList{T}"/> is read-only.
/// </summary>
public bool IsReadOnly
{
get
{
return false;
}
}
/// <summary>
/// Gets a value indicating whether access to the <see cref="FilteredList{T}"/> is synchronized (thread safe).
/// </summary>
public bool IsSynchronized
{
get
{
return false;
}
}
/// <summary>
/// Gets an object that can be used to synchronize access to the <see cref="FilteredList{T}"/>.
/// </summary>
public object SyncRoot
{
get
{
return _SourceList;
}
}
#endregion
#endregion
}
}

How can I improve performance of an AddRange method on a custom BindingList?

I have a custom BindingList that I want create a custom AddRange method for.
public class MyBindingList<I> : BindingList<I>
{
...
public void AddRange(IEnumerable<I> vals)
{
foreach (I v in vals)
Add(v);
}
}
My problem with this is performance is terrible with large collections. The case I am debugging now is trying to add roughly 30,000 records, and taking an unacceptable amount of time.
After looking into this issue online, it seems like the problem is that the use of Add is resizing the array with each addition. This answer I think summarizes it as :
If you are using Add, it is resizing the inner array gradually as needed (doubling)
What can I do in my custom AddRange implementation to specify the size the BindingList needs to resize to be based on the item count, rather than letting it constantly re-allocate the array with each item added?
CSharpie explained in his answer that the bad performance is due to the ListChanged-event firing after each Add, and showed a way to implement AddRange for your custom BindingList.
An alternative would be to implement the AddRange functionality as an extension method for BindingList<T>. Based on on CSharpies implementation:
/// <summary>
/// Extension methods for <see cref="System.ComponentModel.BindingList{T}"/>.
/// </summary>
public static class BindingListExtensions
{
/// <summary>
/// Adds the elements of the specified collection to the end of the <see cref="System.ComponentModel.BindingList{T}"/>,
/// while only firing the <see cref="System.ComponentModel.BindingList{T}.ListChanged"/>-event once.
/// </summary>
/// <typeparam name="T">
/// The type T of the values of the <see cref="System.ComponentModel.BindingList{T}"/>.
/// </typeparam>
/// <param name="bindingList">
/// The <see cref="System.ComponentModel.BindingList{T}"/> to which the values shall be added.
/// </param>
/// <param name="collection">
/// The collection whose elements should be added to the end of the <see cref="System.ComponentModel.BindingList{T}"/>.
/// The collection itself cannot be null, but it can contain elements that are null,
/// if type T is a reference type.
/// </param>
/// <exception cref="ArgumentNullException">values is null.</exception>
public static void AddRange<T>(this System.ComponentModel.BindingList<T> bindingList, IEnumerable<T> collection)
{
// The given collection may not be null.
if (collection == null)
throw new ArgumentNullException(nameof(collection));
// Remember the current setting for RaiseListChangedEvents
// (if it was already deactivated, we shouldn't activate it after adding!).
var oldRaiseEventsValue = bindingList.RaiseListChangedEvents;
// Try adding all of the elements to the binding list.
try
{
bindingList.RaiseListChangedEvents = false;
foreach (var value in collection)
bindingList.Add(value);
}
// Restore the old setting for RaiseListChangedEvents (even if there was an exception),
// and fire the ListChanged-event once (if RaiseListChangedEvents is activated).
finally
{
bindingList.RaiseListChangedEvents = oldRaiseEventsValue;
if (bindingList.RaiseListChangedEvents)
bindingList.ResetBindings();
}
}
}
This way, depending on your needs, you might not even need to write your own BindingList-subclass.
You can pass in a List in the constructor and make use of List<T>.Capacity.
But i bet, the most significant speedup will come form suspending events when adding a range. So I included both things in my example code.
Probably needs some finetuning to handle some worst cases and what not.
public class MyBindingList<I> : BindingList<I>
{
private readonly List<I> _baseList;
public MyBindingList() : this(new List<I>())
{
}
public MyBindingList(List<I> baseList) : base(baseList)
{
if(baseList == null)
throw new ArgumentNullException();
_baseList = baseList;
}
public void AddRange(IEnumerable<I> vals)
{
ICollection<I> collection = vals as ICollection<I>;
if (collection != null)
{
int requiredCapacity = Count + collection.Count;
if (requiredCapacity > _baseList.Capacity)
_baseList.Capacity = requiredCapacity;
}
bool restore = RaiseListChangedEvents;
try
{
RaiseListChangedEvents = false;
foreach (I v in vals)
Add(v); // We cant call _baseList.Add, otherwise Events wont get hooked.
}
finally
{
RaiseListChangedEvents = restore;
if (RaiseListChangedEvents)
ResetBindings();
}
}
}
You cannot use the _baseList.AddRangesince BindingList<T> wont hook the PropertyChanged event then. You can bypass this only using Reflection by calling the private Method HookPropertyChanged for each Item after AddRange. this however only makes sence if vals (your method parameter) is a collection. Otherwise you risk enumerating the enumerable twice.
Thats the closest you can get to "optimal" without writing your own BindingList.
Which shouldnt be too dificult as you could copy the source code from BindingList and alter the parts to your needs.

Looking for a data structure (list) that tracks "usage frequency" [duplicate]

I would like to implement a simple in-memory LRU cache system and I was thinking about a solution based on an IDictionary implementation which could handle an hashed LRU mechanism.
Coming from java, I have experiences with LinkedHashMap, which works fine for what I need: I can't find anywhere a similar solution for .NET.
Has anyone developed it or has anyone had experiences like this?
This a very simple and fast implementation we developed for a web site we own.
We tried to improve the code as much as possible, while keeping it thread safe.
I think the code is very simple and clear, but if you need some explanation or a guide related to how to use it, don't hesitate to ask.
namespace LRUCache
{
public class LRUCache<K,V>
{
private int capacity;
private Dictionary<K, LinkedListNode<LRUCacheItem<K, V>>> cacheMap = new Dictionary<K, LinkedListNode<LRUCacheItem<K, V>>>();
private LinkedList<LRUCacheItem<K, V>> lruList = new LinkedList<LRUCacheItem<K, V>>();
public LRUCache(int capacity)
{
this.capacity = capacity;
}
[MethodImpl(MethodImplOptions.Synchronized)]
public V get(K key)
{
LinkedListNode<LRUCacheItem<K, V>> node;
if (cacheMap.TryGetValue(key, out node))
{
V value = node.Value.value;
lruList.Remove(node);
lruList.AddLast(node);
return value;
}
return default(V);
}
[MethodImpl(MethodImplOptions.Synchronized)]
public void add(K key, V val)
{
if (cacheMap.TryGetValue(key, out var existingNode))
{
lruList.Remove(existingNode);
}
else if (cacheMap.Count >= capacity)
{
RemoveFirst();
}
LRUCacheItem<K, V> cacheItem = new LRUCacheItem<K, V>(key, val);
LinkedListNode<LRUCacheItem<K, V>> node = new LinkedListNode<LRUCacheItem<K, V>>(cacheItem);
lruList.AddLast(node);
// cacheMap.Add(key, node); - here's bug if try to add already existing value
cacheMap[key] = node;
}
private void RemoveFirst()
{
// Remove from LRUPriority
LinkedListNode<LRUCacheItem<K,V>> node = lruList.First;
lruList.RemoveFirst();
// Remove from cache
cacheMap.Remove(node.Value.key);
}
}
class LRUCacheItem<K,V>
{
public LRUCacheItem(K k, V v)
{
key = k;
value = v;
}
public K key;
public V value;
}
}
There is nothing in the base class libraries that does this.
On the free side, maybe something like C5's HashedLinkedList would work.
If you're willing to pay, maybe check out this C# toolkit. It contains an implementation.
I've recently released a class called LurchTable to address the need for a C# variant of the LinkedHashMap. A brief discussion of the LurchTable can be found here.
Basic features:
Linked Concurrent Dictionary by Insertion, Modification, or Access
Dictionary/ConcurrentDictionary interface support
Peek/TryDequeue/Dequeue access to 'oldest' entry
Allows hard-limit on items enforced at insertion
Exposes events for add, update, and remove
Source Code: http://csharptest.net/browse/src/Library/Collections/LurchTable.cs
GitHub: https://github.com/csharptest/CSharpTest.Net.Collections
HTML Help: http://help.csharptest.net/
PM> Install-Package CSharpTest.Net.Collections
The LRUCache answer with sample code above uses MethodImplOptions.Synchronized, which is equivalent to putting lock(this) around each method call. Whilst correct, this global lock will significantly reduce throughput under concurrent load.
To solve this I implemented a thread safe pseudo LRU designed for concurrent workloads. Performance is very close to ConcurrentDictionary, ~10x faster than MemoryCache and hit rate is better than a conventional LRU. Full analysis provided in the github link below.
Usage looks like this:
int capacity = 666;
var lru = new ConcurrentLru<int, SomeItem>(capacity);
var value = lru.GetOrAdd(1, (k) => new SomeItem(k));
GitHub: https://github.com/bitfaster/BitFaster.Caching
Install-Package BitFaster.Caching
Found you answer while googling, also found this:
http://code.google.com/p/csharp-lru-cache/
csharp-lru-cache: LRU cache collection class library
This is a collection class that
functions as a least-recently-used
cache. It implements ICollection<T>,
but also exposes three other members:
Capacity, the maximum number of items
the cache can contain. Once the
collection is at capacity, adding a
new item to the cache will cause the
least recently used item to be
discarded. If the Capacity is set to 0
at construction, the cache will not
automatically discard items.
Oldest,
the oldest (i.e. least recently used)
item in the collection.
DiscardingOldestItem, an event raised
when the cache is about to discard its
oldest item. This is an extremely
simple implementation. While its Add
and Remove methods are thread-safe, it
shouldn't be used in heavy
multithreading environments because
the entire collection is locked during
those methods.
This takes Martin's code with Mr T's suggestions and makes it Stylecop friendly. Oh, it also allows for disposal of values as they cycle out of the cache.
namespace LruCache
{
using System;
using System.Collections.Generic;
/// <summary>
/// A least-recently-used cache stored like a dictionary.
/// </summary>
/// <typeparam name="TKey">
/// The type of the key to the cached item
/// </typeparam>
/// <typeparam name="TValue">
/// The type of the cached item.
/// </typeparam>
/// <remarks>
/// Derived from https://stackoverflow.com/a/3719378/240845
/// </remarks>
public class LruCache<TKey, TValue>
{
private readonly Dictionary<TKey, LinkedListNode<LruCacheItem>> cacheMap =
new Dictionary<TKey, LinkedListNode<LruCacheItem>>();
private readonly LinkedList<LruCacheItem> lruList =
new LinkedList<LruCacheItem>();
private readonly Action<TValue> dispose;
/// <summary>
/// Initializes a new instance of the <see cref="LruCache{TKey, TValue}"/>
/// class.
/// </summary>
/// <param name="capacity">
/// Maximum number of elements to cache.
/// </param>
/// <param name="dispose">
/// When elements cycle out of the cache, disposes them. May be null.
/// </param>
public LruCache(int capacity, Action<TValue> dispose = null)
{
this.Capacity = capacity;
this.dispose = dispose;
}
/// <summary>
/// Gets the capacity of the cache.
/// </summary>
public int Capacity { get; }
/// <summary>Gets the value associated with the specified key.</summary>
/// <param name="key">
/// The key of the value to get.
/// </param>
/// <param name="value">
/// When this method returns, contains the value associated with the specified
/// key, if the key is found; otherwise, the default value for the type of the
/// <paramref name="value" /> parameter. This parameter is passed
/// uninitialized.
/// </param>
/// <returns>
/// true if the <see cref="T:System.Collections.Generic.Dictionary`2" />
/// contains an element with the specified key; otherwise, false.
/// </returns>
public bool TryGetValue(TKey key, out TValue value)
{
lock (this.cacheMap)
{
LinkedListNode<LruCacheItem> node;
if (this.cacheMap.TryGetValue(key, out node))
{
value = node.Value.Value;
this.lruList.Remove(node);
this.lruList.AddLast(node);
return true;
}
value = default(TValue);
return false;
}
}
/// <summary>
/// Looks for a value for the matching <paramref name="key"/>. If not found,
/// calls <paramref name="valueGenerator"/> to retrieve the value and add it to
/// the cache.
/// </summary>
/// <param name="key">
/// The key of the value to look up.
/// </param>
/// <param name="valueGenerator">
/// Generates a value if one isn't found.
/// </param>
/// <returns>
/// The requested value.
/// </returns>
public TValue Get(TKey key, Func<TValue> valueGenerator)
{
lock (this.cacheMap)
{
LinkedListNode<LruCacheItem> node;
TValue value;
if (this.cacheMap.TryGetValue(key, out node))
{
value = node.Value.Value;
this.lruList.Remove(node);
this.lruList.AddLast(node);
}
else
{
value = valueGenerator();
if (this.cacheMap.Count >= this.Capacity)
{
this.RemoveFirst();
}
LruCacheItem cacheItem = new LruCacheItem(key, value);
node = new LinkedListNode<LruCacheItem>(cacheItem);
this.lruList.AddLast(node);
this.cacheMap.Add(key, node);
}
return value;
}
}
/// <summary>
/// Adds the specified key and value to the dictionary.
/// </summary>
/// <param name="key">
/// The key of the element to add.
/// </param>
/// <param name="value">
/// The value of the element to add. The value can be null for reference types.
/// </param>
public void Add(TKey key, TValue value)
{
lock (this.cacheMap)
{
if (this.cacheMap.Count >= this.Capacity)
{
this.RemoveFirst();
}
LruCacheItem cacheItem = new LruCacheItem(key, value);
LinkedListNode<LruCacheItem> node =
new LinkedListNode<LruCacheItem>(cacheItem);
this.lruList.AddLast(node);
this.cacheMap.Add(key, node);
}
}
private void RemoveFirst()
{
// Remove from LRUPriority
LinkedListNode<LruCacheItem> node = this.lruList.First;
this.lruList.RemoveFirst();
// Remove from cache
this.cacheMap.Remove(node.Value.Key);
// dispose
this.dispose?.Invoke(node.Value.Value);
}
private class LruCacheItem
{
public LruCacheItem(TKey k, TValue v)
{
this.Key = k;
this.Value = v;
}
public TKey Key { get; }
public TValue Value { get; }
}
}
}
The Caching Application Block of EntLib has an LRU scavenging option out of the box and can be in memory. It might be a bit heavyweight for what you want tho.
I don't believe so. I've certainly seen hand-rolled ones implemented several times in various unrelated projects (which more or less confirms this. If there was one, surely at least one of the projects would have used it).
It's pretty simple to implement, and usually gets done by creating a class which contains both a Dictionary and a List.
The keys go in the list (in-order) and the items go in the dictionary.
When you Add a new item to the collection, the function checks the length of the list, pulls out the last Key (if it's too long) and then evicts the key and value from the dictionary to match. Not much more to it really
I like Lawrence's implementation. Hashtable + LinkedList is a good solution.
Regarding threading, I would not lock this with[MethodImpl(MethodImplOptions.Synchronized)], but rather use ReaderWriterLockSlim or spin lock (since contention usually fast) instead.
In the Get function I would check if it's already the 1st item first, rather than always removing and adding. This gives you the possibility to keep that within a reader lock that is not blocking other readers.
I just accidently found now LruCache.cs in aws-sdk-net: https://github.com/aws/aws-sdk-net/blob/master/sdk/src/Core/Amazon.Runtime/Internal/Util/LruCache.cs
If it's an asp.net app you can use the cache class[1] but you'll be competing for space with other cached stuff, which may be what you want or may not be.
[1] http://msdn.microsoft.com/en-us/library/system.web.caching.cache.aspx

Creating Thread Safe List using Reader Writer Lock

Completely editing the earlier version, Can the following implementation be the Thread Safe List implementation. I just need to know whether it would truly thread safe or not, I know performance wise there would still be issues. Currently version is using ReaderWriterLockSlim, I have another implementation using the Lock, doing the same job
using System.Collections.Generic;
using System.Threading;
/// <summary>
/// Thread safe version of the List using ReaderWriterLockSlim
/// </summary>
/// <typeparam name="T"></typeparam>
public class ThreadSafeListWithRWLock<T> : IList<T>
{
// Internal private list which would be accessed in a thread safe manner
private List<T> internalList;
// ReaderWriterLockSlim object to take care of thread safe acess between multiple readers and writers
private readonly ReaderWriterLockSlim rwLockList;
/// <summary>
/// Public constructor with variable initialization code
/// </summary>
public ThreadSafeListWithRWLock()
{
internalList = new List<T>();
rwLockList = new ReaderWriterLockSlim();
}
/// <summary>
/// Get the Enumerator to the Thread safe list
/// </summary>
/// <returns></returns>
public IEnumerator<T> GetEnumerator()
{
return Clone().GetEnumerator();
}
/// <summary>
/// System.Collections.IEnumerable.GetEnumerator implementation to get the IEnumerator type
/// </summary>
/// <returns></returns>
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return Clone().GetEnumerator();
}
/// <summary>
/// Clone method to create an in memory copy of the Thread safe list
/// </summary>
/// <returns></returns>
public List<T> Clone()
{
List<T> clonedList = new List<T>();
rwLockList.EnterReadLock();
internalList.ForEach(element => { clonedList.Add(element); });
rwLockList.ExitReadLock();
return (clonedList);
}
/// <summary>
/// Add an item to Thread safe list
/// </summary>
/// <param name="item"></param>
public void Add(T item)
{
rwLockList.EnterWriteLock();
internalList.Add(item);
rwLockList.ExitWriteLock();
}
/// <summary>
/// Remove an item from Thread safe list
/// </summary>
/// <param name="item"></param>
/// <returns></returns>
public bool Remove(T item)
{
bool isRemoved;
rwLockList.EnterWriteLock();
isRemoved = internalList.Remove(item);
rwLockList.ExitWriteLock();
return (isRemoved);
}
/// <summary>
/// Clear all elements of Thread safe list
/// </summary>
public void Clear()
{
rwLockList.EnterWriteLock();
internalList.Clear();
rwLockList.ExitWriteLock();
}
/// <summary>
/// Contains an item in the Thread safe list
/// </summary>
/// <param name="item"></param>
/// <returns></returns>
public bool Contains(T item)
{
bool containsItem;
rwLockList.EnterReadLock();
containsItem = internalList.Contains(item);
rwLockList.ExitReadLock();
return (containsItem);
}
/// <summary>
/// Copy elements of the Thread safe list to a compatible array from specified index in the aray
/// </summary>
/// <param name="array"></param>
/// <param name="arrayIndex"></param>
public void CopyTo(T[] array, int arrayIndex)
{
rwLockList.EnterReadLock();
internalList.CopyTo(array,arrayIndex);
rwLockList.ExitReadLock();
}
/// <summary>
/// Count elements in a Thread safe list
/// </summary>
public int Count
{
get
{
int count;
rwLockList.EnterReadLock();
count = internalList.Count;
rwLockList.ExitReadLock();
return (count);
}
}
/// <summary>
/// Check whether Thread safe list is read only
/// </summary>
public bool IsReadOnly
{
get { return false; }
}
/// <summary>
/// Index of an item in the Thread safe list
/// </summary>
/// <param name="item"></param>
/// <returns></returns>
public int IndexOf(T item)
{
int itemIndex;
rwLockList.EnterReadLock();
itemIndex = internalList.IndexOf(item);
rwLockList.ExitReadLock();
return (itemIndex);
}
/// <summary>
/// Insert an item at a specified index in a Thread safe list
/// </summary>
/// <param name="index"></param>
/// <param name="item"></param>
public void Insert(int index, T item)
{
rwLockList.EnterWriteLock();
if (index <= internalList.Count - 1 && index >= 0)
internalList.Insert(index,item);
rwLockList.ExitWriteLock();
}
/// <summary>
/// Remove an item at a specified index in Thread safe list
/// </summary>
/// <param name="index"></param>
public void RemoveAt(int index)
{
rwLockList.EnterWriteLock();
if (index <= internalList.Count - 1 && index >= 0)
internalList.RemoveAt(index);
rwLockList.ExitWriteLock();
}
/// <summary>
/// Indexer for the Thread safe list
/// </summary>
/// <param name="index"></param>
/// <returns></returns>
public T this[int index]
{
get
{
T returnItem = default(T);
rwLockList.EnterReadLock();
if (index <= internalList.Count - 1 && index >= 0)
returnItem = internalList[index];
rwLockList.ExitReadLock();
return (returnItem);
}
set
{
rwLockList.EnterWriteLock();
if (index <= internalList.Count - 1 && index >= 0)
internalList[index] = value;
rwLockList.ExitWriteLock();
}
}
}
Implementing a custom List<T> that encapsulates thread-safety is rarely worth the effort. You're likely best off just using lock whenever you access the List<T>.
But being in a performance intensive industry myself there have been cases where this becomes a bottleneck. The main drawback to lock is the possibility of context switching which is, relatively speaking, extremely expensive both in wall clock time and CPU cycles.
The best way around this is using immutability. Have all readers access an immutable list and writers "update" it using Interlocked operations to replace it with a new instance. This is a lock-free design that makes reads free of synchronization and writes lock-free (eliminates context switching).
I will stress that in almost all cases this is overkill and I wouldn't even consider going down this path unless you're positive you need to and you understand the drawbacks. A couple of the obvious ones are readers getting point-in-time snapshots and wasting memory creating copies.
ImmutableList from Microsoft.Bcl.Immutable is also worth a look. It's entirely thread-safe.
That is not threadsafe.
The method GetEnumerator() will not retain any lock after the enumerator has been returned, so any thread would be free to use the returned enumerator without any locking to prevent them from doing so.
In general, trying to create a threadsafe list type is very difficult.
See this StackOverflow thread for some discussion: No ConcurrentList<T> in .Net 4.0?
If you are trying to use some kind of reader/writer lock, rather than a simple locking scheme for both reading and writing, your concurrent reads probably greatly outnumber your writes. In that case, a copy-on-write approach, as suggested by Zer0, can be appropriate.
In an answer to a related question I posted a generic utility function that helps turning any modification on any data structure into a thread-safe and highly concurrent operation.
Code
static class CopyOnWriteSwapper
{
public static void Swap<T>(ref T obj, Func<T, T> cloner, Action<T> op)
where T : class
{
while (true)
{
var objBefore = Volatile.Read(ref obj);
var newObj = cloner(objBefore);
op(newObj);
if (Interlocked.CompareExchange(ref obj, newObj, objBefore) == objBefore)
return;
}
}
}
Usage
CopyOnWriteSwapper.Swap(ref _myList,
orig => new List<string>(orig),
clone => clone.Add("asdf"));
More details about what you can do with it, and a couple caveats can be found in the original answer.

What is the fastest way to get lists of files and search through file lists repeatedly?

What is the fastest way to get lists of files and search through file lists repeatedly?
Situation:
There can be 5,000 to 35,000 files spread over 3-5 root directories (that have many subdirectories) on a network drive.
There are three file types (tif, sus, img) that a user may or may not search for. Some file types can have 2-3 different file extensions.
From a list of files (in various database tables), I need to find out if each file exists, and if it does, save the path only and filename only to a table.
Searches on file names must be case sensitive but it would be nice to keep original case when saving the path to the table.
Environment:
C# and .NET 4.0 on Windows PC.
Is this the fastest way?:
Is the fastest way to use a dictionary, with FileName as a key (lowercase) and Path as a value (original case)?
In this way I can get the index/Path at the same pass when I search for the filename? The FileName and Path are split up front when populating the list.
if (d.TryGetValue("key", out value))
{
// Log "key" and value to table // only does one lookup
}
Note: I am a bit concerned that I probably will have duplicate key values per FileType.
When/If I run across this scenario what type of list and access method should I use?
Maybe on these rare cases, I should populate another list of the duplicate keys. Because I will need to do at least one of: log/copy/delete of the files in any path.
I would use a Dictionary<string,string> with the FullName (path+file+ext) changed to lower case as key and the FullName unchanged as value. Then split the required parts using the static methods GetDirectoryName and GetFileName of the System.IO.Path class before inserting them into the table.
EDIT: The GetFiles method of the DirectoryInfo class returns an array of FileInfo. FileInfo has a FullName property returning path+file+ext. You could as well store this FileInfo as value in your dictionary if memory consumption is not an issue. FileInfo has a DirectoryName and a Name property returning the two parts you need.
EDIT: Here is my implementation of a multimap which does the Directory<TKey,List<TValue>> stuff:
/// <summary>
/// Represents a collection of keys and values. Multiple values can have the same key.
/// </summary>
/// <typeparam name="TKey">Type of the keys.</typeparam>
/// <typeparam name="TValue">Type of the values.</typeparam>
public class MultiMap<TKey, TValue> : Dictionary<TKey, List<TValue>>
{
public MultiMap()
: base()
{
}
public MultiMap(int capacity)
: base(capacity)
{
}
/// <summary>
/// Adds an element with the specified key and value into the MultiMap.
/// </summary>
/// <param name="key">The key of the element to add.</param>
/// <param name="value">The value of the element to add.</param>
public void Add(TKey key, TValue value)
{
List<TValue> valueList;
if (TryGetValue(key, out valueList)) {
valueList.Add(value);
} else {
valueList = new List<TValue>();
valueList.Add(value);
Add(key, valueList);
}
}
/// <summary>
/// Removes first occurence of a element with a specified key and value.
/// </summary>
/// <param name="key">The key of the element to remove.</param>
/// <param name="value">The value of the element to remove.</param>
/// <returns>true if the a element is removed; false if the key or the value were not found.</returns>
public bool Remove(TKey key, TValue value)
{
List<TValue> valueList;
if (TryGetValue(key, out valueList)) {
if (valueList.Remove(value)) {
if (valueList.Count == 0) {
Remove(key);
}
return true;
}
}
return false;
}
/// <summary>
/// Removes all occurences of elements with a specified key and value.
/// </summary>
/// <param name="key">The key of the elements to remove.</param>
/// <param name="value">The value of the elements to remove.</param>
/// <returns>Number of elements removed.</returns>
public int RemoveAll(TKey key, TValue value)
{
List<TValue> valueList;
int n = 0;
if (TryGetValue(key, out valueList)) {
while (valueList.Remove(value)) {
n++;
}
if (valueList.Count == 0) {
Remove(key);
}
}
return n;
}
/// <summary>
/// Gets the total number of values contained in the MultiMap.
/// </summary>
public int CountAll
{
get
{
int n = 0;
foreach (List<TValue> valueList in Values) {
n += valueList.Count;
}
return n;
}
}
/// <summary>
/// Determines whether the MultiMap contains a element with a specific key / value pair.
/// </summary>
/// <param name="key">Key of the element to search for.</param>
/// <param name="value">Value of the element to search for.</param>
/// <returns>true if the element was found; otherwise false.</returns>
public bool Contains(TKey key, TValue value)
{
List<TValue> valueList;
if (TryGetValue(key, out valueList)) {
return valueList.Contains(value);
}
return false;
}
/// <summary>
/// Determines whether the MultiMap contains a element with a specific value.
/// </summary>
/// <param name="value">Value of the element to search for.</param>
/// <returns>true if the element was found; otherwise false.</returns>
public bool Contains(TValue value)
{
foreach (List<TValue> valueList in Values) {
if (valueList.Contains(value)) {
return true;
}
}
return false;
}
}
I would probably use a dictionary with filename lowercased as key. Value would be a class with the needed extra information. I would also search it like your example. If this was slow I would probably also try searching with linq just to see if it was faster. This is however one problem here; this requires that all files through all folders are uniquely named. That might be the case for you, but it could also be a problem if you haven't already considered it ;)
Remember that you can also use a FileWatcher object to keep the memory dictionary/list synchronized with the disk contents if it is subject to change. If it's static I would probably store it all in a database table and search that instead, startup of your program would then be instatanious.
Edit:
Just now noticed your conscern for duplicates. If that's a problem I would create a List where fileclass is a class containing needed information on the files. Then search the list using linq as that could give you zero, one or more hits. I think that would be more efficient than a dictionary with a list as value, where the list would contain one or more items (duplicates).

Categories

Resources