Observable order set in .NET? - c#

I need a set that is order which they were added just like a list.
The set may also be observable.
Any built.in set like this in .NET 4?

As far as I know, there is no such type in .NET. I recently needed this and ended up implementing it myself; it's not that difficult.
The trick is to combine a Dictionary<T, LinkedListNode<T>> with a LinkedList<T>. Use the dictionary to query keys and values in O(1) time and the list to iterate in insertion-order. You need a dictionary instead of a set because you want to be able to call LinkedList<T>.Remove(LinkedListNode<T>) and not LinkedList<T>.Remove(T). The former has O(1) time complexity, the latter O(n).

It sounds like you need ReadOnly Queue. In .Net we have built in Queue class but there is no built in ReadOnly Queue.
To make sure there is no Duplicate value you can use contains check
There is one Nuget package which has ImmutableQueue. Not sure if it can help you.
This creates new Queue object everytime when Enqueue or Dequeue operation is done.
https://msdn.microsoft.com/en-us/library/dn467186(v=vs.111).aspx

I guess you could use a SortedDictionary<> and a Dictionary<> together to do this.
Assuming that you are never going to do more than int.MaxValue insertions into the set, you can use an integer "sequence number" as a key into a SortedDictionary that keeps track of the inserted items in insertion order.
Alongside this you need to use a Dictionary to map items to the sequence number that was used to insert them.
Putting this together into a class and a demo program: (NOT THREAD SAFE!)
using System;
using System.Collections;
using System.Collections.Generic;
namespace Demo
{
public sealed class SequencedSet<T>: IEnumerable<T>
{
private readonly SortedDictionary<int, T> items = new SortedDictionary<int, T>();
private readonly Dictionary<T, int> order = new Dictionary<T, int>();
private int sequenceNumber = 0;
public void Add(T item)
{
if (order.ContainsKey(item))
return; // Or throw if you want.
order[item] = sequenceNumber;
items[sequenceNumber] = item;
++sequenceNumber;
}
public void Remove(T item)
{
if (!order.ContainsKey(item))
return; // Or throw if you want.
int sequence = order[item];
items.Remove(sequence);
order.Remove(item);
}
public bool Contains(T item)
{
return order.ContainsKey(item);
}
public IEnumerator<T> GetEnumerator()
{
return items.Values.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
internal class Program
{
private static void Main()
{
var test = new SequencedSet<string>();
test.Add("One");
test.Add("Two");
test.Add("Three");
test.Add("Four");
test.Add("Five");
test.Remove("Four");
test.Remove("Two");
foreach (var item in test)
Console.WriteLine(item);
}
}
}
This should be fairly performant for insertions and deletions, but it will take double the memory of course. If you are doing a great number of insertions and deletions you could use a long instead of an int for the sequence numbering.
Unfortunately, if you are doing more than 2^63 deletions, even that won't work - although I would imagine that should be more than enough...

Related

C# Generic List<T> update items

I'm using a List<T> and I need to update the objects properties that the list has.
What would be the most efficient/faster way to do this? I know that scanning through the index of a List<T> would be slower as this list grows and that the List<T> is not the most efficient collection to do updates.
That sad, would be better to:
Remove the match object then add a new one?
Scan through the list indexes until you find the matching object and then update the object's properties?
If I have a collection, let's IEnumerable and I want to update that IEnumerable into the List, what would be best approach.
Stub code sample:
public class Product
{
public int ProductId { get; set; }
public string ProductName { get; set; }
public string Category { get; set; }
}
public class ProductRepository
{
List<Product> product = Product.GetProduct();
public void UpdateProducts(IEnumerable<Product> updatedProduct)
{
}
public void UpdateProduct(Product updatedProduct)
{
}
}
You could consider using Dictionary instead of List if you want fast lookups. In your case it would be the product Id (which I am assuming is unique). Dictionary MSDN
For example:
public class ProductRepository
{
Dictionary<int, Product> products = Product.GetProduct();
public void UpdateProducts(IEnumerable<Product> updatedProducts)
{
foreach(var productToUpdate in updatedProducts)
{
UpdateProduct(productToUpdate);
}
///update code here...
}
public void UpdateProduct(Product productToUpdate)
{
// get the product with ID 1234
if(products.ContainsKey(productToUpdate.ProductId))
{
var product = products[productToUpdate.ProductId];
///update code here...
product.ProductName = productToUpdate.ProductName;
}
else
{
//add code or throw exception if you want here.
products.Add(productToUpdate.ProductId, productToUpdate);
}
}
}
Your use case is updating a List<T>, which can contains millions of records, and updated records can be a sub-list or just a single record
Following is the Schema:
public class Product
{
public int ProductId { get; set; }
public string ProductName { get; set; }
public string Category { get; set; }
}
Does Product contains a primary key, which means every Product object can be uniquely identified and there are no duplicates and every update target a single unique record?
If Yes, then it is best to arrange the List<T> in the form of Dictionary<int,T>, which would mean for an IEnumerable<T> every update would be an O(1) time complexity and that would mean all the updates could be done depending on the size of the IEnumerable<T>, which i don't expect to be very big and though there would be extra memory allocation of different data structure required, but would be a very fast solution.#JamieLupton has already provided a solution on similar lines
In case Product is repeated, there's no primary key, then above solution is not valid, then ideal way to scan through the List<T> is Binary Search, whose time complexity is O(logN)
Now since size of IEnumerable<T> is comparatively small say M, so the overall time complexity would be O(M*logN), where M is much smaller than N and can be neglected.
List<T> support Binary Search API, which provides the element index, which can then be used to update the object at relevant index, check example here
Best Option as per me for such a high number of records would be parallel processing along with binary search
Now since, thread safety is an issue, what I normally do is divide a List<T> into List<T>[], since then each unit can be assigned to a separate thread, a simple way is use MoreLinq batch Api, where you can fetch the number of system processors as using Environment.ProcessorCount and then create IEnumerable<IEnumerable<T>> as follows:
var enumerableList = List<T>.Batch(Environment.ProcessorCount).ToList();
Another way is following custom code:
public static class MyExtensions
{
// data - List<T>
// dataCount - Calculate once and pass to avoid accessing the property everytime
// Size of Partition, which can be function of number of processors
public static List<T>[] SplitList<T>(this List<T> data, int dataCount, int partitionSize)
{
int remainderData;
var fullPartition = Math.DivRem(dataCount, partitionSize, out remainderData);
var listArray = new List<T>[fullPartition];
var beginIndex = 0;
for (var partitionCounter = 0; partitionCounter < fullPartition; partitionCounter++)
{
if (partitionCounter == fullPartition - 1)
listArray[partitionCounter] = data.GetRange(beginIndex, partitionSize + remainderData);
else
listArray[partitionCounter] = data.GetRange(beginIndex, partitionSize);
beginIndex += partitionSize;
}
return listArray;
}
}
Now you can create Task[], where each Task is assigned for every element List<T>, on the List<T>[] generated above, then Binary search for each sub partition. Though its repetitive but would be using the power of Parallel processing and Binary search. Each Task can be started and then we can wait using Task.WaitAll(taskArray) to wait for Task processing to finish
Over and above that, if you want to create a Dictionary<int,T>[] and thus use parallel processing then this would be fastest.
Final integration of List<T>[] to List<T> can be done using Linq Aggregation or SelectMany as follows:
List<T>[] splitListArray = Fetch splitListArray;
// Process splitListArray
var finalList = splitListArray.SelectMany(obj => obj).ToList()
Another option would be to use Parallel.ForEach along with a thread safe data structure like ConcurrentBag<T> or may be ConcurrentDictionary<int,T> in case you are replacing complete object, but if its property update then a simple List<T> would work. Parallel.ForEach internally use range partitioner similar to what I have suggested above
Solutions mentioned above ideally depends on your use case, you shall be able to use combination to achieve the best possible result. Let me know, in case you need specific example
What exactly is efficiency?
Unless there are literally thousands of items doing a foreach, or for or any other type of looping operation will most likely only show differences in the milleseconds. Really? Hence you have wasted more time (in costs of a programmer at $XX per hour than an end user costs) trying to find that best.
So if you have literally thousands of records I would recommend that efficiency be found by parallel processing the list with the Parallel.Foreach method which can process more records to save time with the overhead of threading.
IMHO if the record count is greater than 100 it implies that there is a database being used. If a database is involved, write an update sproc and call it a day; I would be hard pressed to write a one-off program to do a specific update which could be done in an easier fashion in said database.

A multithreading implementation of a lazy-loaded cache with known key list

I'm developing a multithread application with .Net 3.5 that reads records from different tables stored in a database. The readings are very frequent so there is the need of a lazy loading caching implementation. Every table is mapped to a C# class and has a string column that can be used as key in the cache.
In addition there is the requirement that periodically all the cached records should be refreshed.
I could have implemented the cache with a lock on every reading to ensure a thread-safe environment, but then I thought another solution that rely on the fact that it is simple to get a list of all possible keys.
So here is the first class I wrote, that stores the list of all the keys that is lazy loaded with the double checked lock pattern. It also has a method that stores in a static variable the timestamp of the last requested refresh.
public class Globals
{
private static object _KeysLock = new object();
public static volatile List<string> Keys;
public static void LoadKeys()
{
if (Keys == null)
{
lock (_KeysLock)
{
if (Keys == null)
{
List<string> keys = new List<string>();
// Filling all possible keys from DB
// ...
Keys = keys;
}
}
}
}
private static long refreshTimeStamp = DateTime.Now.ToBinary();
public static DateTime RefreshTimeStamp
{
get { return DateTime.FromBinary(Interlocked.Read(ref refreshTimeStamp)); }
}
public static void NeedRefresh()
{
Interlocked.Exchange(ref refreshTimeStamp, DateTime.Now.ToBinary());
}
}
Then I wrote the CacheItem<T> class that is the implementation of a single item of the cache for a specified table T filtered by the key. It has the Load method for the record list lazy-loading and the LoadingTimeStamp property that stores the timestamp of the last record loading. Please note that the static list of records is overwritten with the new one that is locally filled and then the LoadingTimeStamp is overritten too.
public class CacheItem<T>
{
private List<T> _records;
public List<T> Records
{
get { return _records; }
}
private long loadingTimestampTick;
public DateTime LoadingTimestamp
{
get { return DateTime.FromBinary(Interlocked.Read(ref loadingTimestampTick)); }
set { Interlocked.Exchange(ref loadingTimestampTick, value.ToBinary()); }
}
public void Load(string key)
{
List<T> records = new List<T>();
// Filling records from DB filtered on key
// ...
_records = records;
LoadingTimestamp = DateTime.Now;
}
}
And finally here is the Cache<T> class that stores the cache for the table T as a static Dictionary. As you can see the Get method first loads all possible keys in the cache if not already done and then checks the timestamps for the refresh (both are done with the double checked lock pattern). The list of record in the instance returned by a Get call can safely be read by a thread even if there is another thread that is doing the refresh inside the lock, because the refreshing thread does not modify the list itself but creates a new one.
public class Cache<T>
{
private static object _CacheSynch = new object();
private static Dictionary<string, CacheItem<T>> _Cache = new Dictionary<string, CacheItem<T>>();
private static volatile bool _KeysLoaded = false;
public static CacheItem<T> Get(string key)
{
bool checkRefresh = true;
CacheItem<T> item = null;
if (!_KeysLoaded)
{
lock (_CacheSynch)
{
if (!_KeysLoaded)
{
Globals.LoadKeys(); // Checks the lazy loading of the common key list
foreach (var k in Globals.Keys)
{
item = new CacheItem<T>();
if (k == key)
{
// As long as the lock is acquired let's load records for the requested key
item.Load(key);
// then the refresh is no more needed by the current thread
checkRefresh = false;
}
_Cache.Add(k, item);
}
_KeysLoaded = true;
}
}
}
// here the key is certainly contained in the cache
item = _Cache[key];
if (checkRefresh)
{
// let's check the timestamps to know if refresh is needed
DateTime rts = Globals.RefreshTimeStamp;
if (item.LoadingTimestamp < rts)
{
lock (_CacheSynch)
{
if (item.LoadingTimestamp < rts)
{
// refresh is needed
item.Load(key);
}
}
}
}
return item;
}
}
Periodically the Globals.NeedRefresh() is called to ensure the records will be refreshed.
This solution can avoid a lock on every reading of the cache because the cache is pre-filled with all the possible keys: this means that there will be in memory a number of instances that is equal to the number of all possible keys (about 20) for each requested type T (all the T types are about 100), but only for the requested keys the record lists are not empty.
Please let me know if this solution has some thread-safety issue or anything wrong.
Thank you very mutch.
Given that:
You load all keys once and never change them
You create each dictionary once and never change it
CacheItem.Load is thread-safe because it only ever replaces a private List<T> field with a new completely initialized list.
you don't need any locks at all, so could simplify the code.
The only possible need for a lock is to prevent concurrent attempts to run CacheItem.Load. Personally I'd just let the concurrent database accesses run, but if you want to prevent it, you could just implement a lock in CacheItem.Load. Or pinch Lazy<T> from .NET 4 and use that as suggested in my answer to your previous question.
Another comment is that your refresh logic uses DateTime.Now so won't behave as expected (a) during the period when the clocks go back at the end of Daylight Saving Time and (b) if the system clock is updated.
I would simply use a static integer value that is incremented each time NeedRefreshis called.
From comments:
What happens for example if two threads ... try to load the common Globals.Keys at the same time?"
There is a small risk that this may happen once at application startup, but so what? It will result in the 20 keys being read from the database twice, but the performance impact is likely to be negligible. And if you really want to prevent this, any locking could be encapsulated in a class like Lazy<T>.
The comment about using DateTime.Now is in fact a point of interest but I think maybe I can suppose those events can occur while the application is not in use.
You can "suppose" that, but not guarantee it. Your machine may decide to synchronize it's time with a time server at any time.
About the advice of using an integer in NeedRefresh I don't understand how can I compare it with each record list state which is represented by a DateTime.
As far as I can see, you only use the DateTime to check if your data was loaded before or after the most recent call to NeedRefresh. So you could replace this by:
public static class Globals
{
...
public static int Version { get {return _version; } }
private static int _version;
public static void NeedRefresh()
{
Interlocked.Increment(ref _version);
}
}
public class CacheItem<T>
{
public int Version {get; private set; }
...
public void Load(string key)
{
Version = Globals.Version;
List<T> records = new List<T>();
// Filling records from DB filtered on key
// ...
_records = records;
}
}
then when accessing the cache:
item = _Cache[key];
if (item.Version < Globals.Version) item.Load();
** UPDATE 2 **
In response to comment:
... There could be a real risk of integrity if one thread tries to read the Dictionary while another is adding items to it inside the lock, couldn't be?
Your existing code adds all keys to the dictionary once only immediately after loading the global keys, and subsequently never modifies the dictionary. So this is thread-safe as long as you don't assign the _Cache property until the dictionary is completely constructed:
var dictionary = new Dictionary<string, CacheItem<T>>(Global.Keys.Count);
foreach (var k in Globals.Keys)
{
dictionary.Add(k, new CacheItem<T>());
}
_Cache = dictionary;

Memory Leak in large Array - Will subclassing IList fix it?

I need to improve memory performance on my application and I could see that I have problems with memory fragmentation.
I've read an interesting article on large objects from Andrew Hunter of Red Gate, and one of the solutions he recommends is:
If large data structures need to live
for a long time, and especially if
they need to grow in size over time,
the best approach is simply to
consider using or writing a different
data structure to store them. Arrays
can contain up to around 10,000
elements before they are put on the
large object heap and can cause
problems, so a very effective way to
store 100,000 entries might be to
store 10 arrays each containing 10,000
elements: none will end up on the
large object heap so no fragmentation
will occur. This could be written as
an IList subclass, which would make it
easy to drop in transparently to
replace existing code.
How do I implement his suggestion in my code?
My program has a very complex form (with an object that leaves residual memory every time it opens. I found a complex list that may be the culprit, and I'd like to implement his suggestion to see if it fixes the issue.
What's wrong with using List for that? That's nothing but an implementation of IList and you can do the partitioning yourself. But if you want to do it transparently:
Implement IList (it's just an interface, nothing special about it. Maybe I don't understand the question?) and back it up by arrays of your desired size. Your Get() would then take the index / sizeOfArrays as index of the array containing the desired item and return the index % sizeOfArraysth item in that array.
For fun, because it's a lazy friday, I wrote something up. Note:
I didn't test it
I cannot comment on the correctnes of your quoted claims that this might help avoiding memory fragmentation, I just blindly looked at your request
I don't know if List or any other collection is already smart enough to do just that
I made some decisions that might not be right for you (i.e. you cannot blindly drop this in your code if you're using arrays now. Look at the Item implementation, especially the setter, for example
That said, here's a starting point that reduced my pre-weekend motivational deficit. I left some interesting methods as exercise to the dear reader (or OP).. ;-)
public class PartitionList<T> : IList<T> {
private readonly int _maxCountPerList;
private readonly IList<IList<T>> _lists;
public PartitionList(int maxCountPerList) {
_maxCountPerList = maxCountPerList;
_lists = new List<IList<T>> { new List<T>() };
}
public IEnumerator<T> GetEnumerator() {
return _lists.SelectMany(list => list).GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator() {
return GetEnumerator();
}
public void Add(T item) {
var lastList = _lists[_lists.Count - 1];
if (lastList.Count == _maxCountPerList) {
lastList = new List<T>();
_lists.Add(lastList);
}
lastList.Add(item);
}
public void Clear() {
while (_lists.Count > 1) _lists.RemoveAt(1);
_lists[0].Clear();
}
public bool Contains(T item) {
return _lists.Any(sublist => sublist.Contains(item));
}
public void CopyTo(T[] array, int arrayIndex) {
// Homework
throw new NotImplementedException();
}
public bool Remove(T item) {
// Evil, Linq with sideeffects
return _lists.Any(sublist => sublist.Remove(item));
}
public int Count {
get { return _lists.Sum(subList => subList.Count); }
}
public bool IsReadOnly {
get { return false; }
}
public int IndexOf(T item) {
int index = _lists.Select((subList, i) => subList.IndexOf(item) * i).Max();
return (index > -1) ? index : -1;
}
public void Insert(int index, T item) {
// Homework
throw new NotImplementedException();
}
public void RemoveAt(int index) {
// Homework
throw new NotImplementedException();
}
public T this[int index] {
get {
if (index >= _lists.Sum(subList => subList.Count)) {
throw new IndexOutOfRangeException();
}
var list = _lists[index / _maxCountPerList];
return list[index % _maxCountPerList];
}
set {
if (index >= _lists.Sum(subList => subList.Count)) {
throw new IndexOutOfRangeException();
}
var list = _lists[index / _maxCountPerList];
list[index % _maxCountPerList] = value;
}
}
}

How to use Comparer for a HashSet

As a result of another question I asked here I want to use a HashSet for my objects
I will create objects containing a string and a reference to its owner.
public class Synonym
{
private string name;
private Stock owner;
public Stock(string NameSynonym, Stock stock)
{
name=NameSynonym;
owner=stock
}
// [+ 'get' for 'name' and 'owner']
}
I understand I need a comparer , but never used it before. Should I create a separate class? like:
public class SynonymComparer : IComparer<Synonym>
{
public int Compare(Synonym One, Synonym Two)
{ // Should I test if 'One == null' or 'Two == null' ????
return String.Compare(One.Name, Two.Name, true); // Caseinsesitive
}
}
I prefer to have a function (or nested class [maybe a singleton?] if required) being PART of class Synonym instead of another (independent) class. Is this possible?
About usage:
As i never used this kind of thing before I suppose I must write a Find(string NameSynonym) function inside class Synonym, but how should I do that?
public class SynonymManager
{
private HashSet<SynonymComparer<Synonym>> ListOfSynonyms;
public SynonymManager()
{
ListOfSymnonyms = new HashSet<SynonymComparer<Synonym>>();
}
public void SomeFunction()
{ // Just a function to add 2 sysnonyms to 1 stock
Stock stock = GetStock("General Motors");
Synonym otherName = new Synonym("GM", stock);
ListOfSynonyms.Add(otherName);
Synonym otherName = new Synonym("Gen. Motors", stock);
ListOfSynonyms.Add(otherName);
}
public Synonym Find(string NameSynomym)
{
return ListOfSynonyms.??????(NameSynonym);
}
}
In the code above I don't know how to implement the 'Find' method. How should i do that?
Any help will be appreciated
(PS If my ideas about how it should be implemented are completely wrong let me know and tell me how to implement)
A HashSet doesn't need a IComparer<T> - it needs an IEqualityComparer<T>, such as
public class SynonymComparer : IEqualityComparer<Synonym>
{
public bool Equals(Synonym one, Synonym two)
{
// Adjust according to requirements.
return StringComparer.InvariantCultureIgnoreCase
.Equals(one.Name, two.Name);
}
public int GetHashCode(Synonym item)
{
return StringComparer.InvariantCultureIgnoreCase
.GetHashCode(item.Name);
}
}
However, your current code only compiles because you're creating a set of comparers rather than a set of synonyms.
Furthermore, I don't think you really want a set at all. It seems to me that you want a dictionary or a lookup so that you can find the synonyms for a given name:
public class SynonymManager
{
private readonly IDictionary<string, Synonym> synonyms = new
Dictionary<string, Synonym>();
private void Add(Synonym synonym)
{
// This will overwrite any existing synonym with the same name.
synonyms[synonym.Name] = synonym;
}
public void SomeFunction()
{
// Just a function to add 2 synonyms to 1 stock.
Stock stock = GetStock("General Motors");
Synonym otherName = new Synonym("GM", stock);
Add(otherName);
ListOfSynonyms.Add(otherName);
otherName = new Synonym("Gen. Motors", stock);
Add(otherName);
}
public Synonym Find(string nameSynonym)
{
// This will throw an exception if you don't have
// a synonym of the right name. Do you want that?
return synonyms[nameSynonym];
}
}
Note that there are some questions in the code above, about how you want it to behave in various cases. You need to work out exactly what you want it to do.
EDIT: If you want to be able to store multiple stocks for a single synonym, you effectively want a Lookup<string, Stock> - but that's immutable. You're probably best storing a Dictionary<string, List<Stock>>; a list of stocks for each string.
In terms of not throwing an error from Find, you should look at Dictionary.TryGetValue which doesn't throw an exception if the key isn't found (and also returns whether or not the key was found); the mapped value is "returned" in an out parameter.
Wouldn't it be more reasonable to scrap the Synonym class entirely and have list of synonyms to be a Dictonary (or, if there is such a thing, HashDictionary) of strings?
(I'm not very familiar with C# types, but I hope this conveys general idea)
The answer I recommend (edited, now respects the case):
IDictionary<string, Stock>> ListOfSynonyms = new Dictionary<string,Stock>>();
IDictionary<string, string>> ListOfSynForms = new Dictionary<string,string>>();
class Stock
{
...
Stock addSynonym(String syn)
{
ListOfSynForms[syn.ToUpper()] = syn;
return ListOfSynonyms[syn.ToUpper()] = this;
}
Array findSynonyms()
{
return ListOfSynonyms.findKeysFromValue(this).map(x => ListOfSynForms[x]);
}
}
...
GetStock("General Motors").addSynonym('GM').addSynonym('Gen. Motors');
...
try
{
... ListOfSynonyms[synonym].name ...
}
catch (OutOfBounds e)
{
...
}
...
// output everything that is synonymous to GM. This is mix of C# and Python
... GetStock('General Motors').findSynonyms()
// test if there is a synonym
if (input in ListOfSynonyms)
{
...
}
You can always use LINQ to do the lookup:
public Synonym Find(string NameSynomym)
{
return ListOfSynonyms.SingleOrDefault(x => x.Name == NameSynomym);
}
But, have you considered using a Dictionary instead, I believe it is better suited for extracting single members, and you can still guarantee that there are no duplicates based on the key you choose.
I am not sure that lookup time is of SingleOrDefault, but I am pretty sure it is linear (O(n)), so if lookup time is important to you, a Dictionary will provide you with O(1) lookup time.

Efficient modelling of an MruList in C# or Java

How would you implement a capacity-limited, generic MruList in C# or Java?
I want to have a class that represents a most-recently-used cache or list (= MruList). It should be generic, and limited to a capacity (count) specified at instantiation. I'd like the interface to be something like:
public interface IMruList<T>
{
public T Store(T item);
public void Clear();
public void StoreRange(T[] range);
public List<T> GetList();
public T GetNext(); // cursor-based retrieval
}
Each Store() should put the item at the top (front?) of the list. The GetList() should return all items in an ordered list, ordered by most recent store. If I call Store() 20 times and my list is 10 items long, I only want to retain the 10 most-recently Stored items. The GetList and StoreRange is intended to support retrieval/save of the MruList on app start and shutdown.
This is to support a GUI app.
I guess I might also want to know the timestamp on a stored item. Maybe. Not sure.
Internally, how would you implement it, and why?
(no, this is not a course assignment)
Couple of comments about your approach
Why have Store return T? I know what I just added, returning it back to me is un-necessary unless you explicitly want method chaining
Refactor GetNext() into a new class. It represents a different set of functionality (storage vs. cursor traversal) and should be represented by a separate interface. It also has usability concerns as what happens when two different methods active on the same stack want to traverse the structure?
GetList() should likely return IEnumerable<T>. Returning List<T> either forces an explicit copy up front or returns a pointer to an underlying implementation. Neither is a great choice.
As for what is the best structure to back the interface. It seems like the best to implement is to have a data structure which is efficient at adding to one end, and removing from the other. A doubly linked list would suit this nicely.
Here's a Cache class that stores objects by the time they were accessed. More recent items bubble to the end of the list. The cache operates off an indexer property that takes an object key. You could easily replace the internal dictionary to a list and reference the list from the indexer.
BTW, you should rename the class to MRU as well :)
class Cache
{
Dictionary<object, object> cache = new Dictionary<object, object>();
/// <summary>
/// Keeps up with the most recently read items.
/// Items at the end of the list were read last.
/// Items at the front of the list have been the most idle.
/// Items at the front are removed if the cache capacity is reached.
/// </summary>
List<object> priority = new List<object>();
public Type Type { get; set; }
public Cache(Type type)
{
this.Type = type;
//TODO: register this cache with the manager
}
public object this[object key]
{
get
{
lock (this)
{
if (!cache.ContainsKey(key)) return null;
//move the item to the end of the list
priority.Remove(key);
priority.Add(key);
return cache[key];
}
}
set
{
lock (this)
{
if (Capacity > 0 && cache.Count == Capacity)
{
cache.Remove(priority[0]);
priority.RemoveAt(0);
}
cache[key] = value;
priority.Remove(key);
priority.Add(key);
if (priority.Count != cache.Count)
throw new Exception("Capacity mismatch.");
}
}
}
public int Count { get { return cache.Count; } }
public int Capacity { get; set; }
public void Clear()
{
lock (this)
{
priority.Clear();
cache.Clear();
}
}
}
I would have an internal ArrayList and have Store() delete the last element if its size exceeds the capacity established in the constructor. I think standard terminology, strangely enough, calls this an "LRU" list, because the least-recently-used item is what gets discarded. See wikipedia's entry for this.
You can build this up with a Collections.Generic.LinkedList<T>.
When you push an item into a full list, delete the last one and insert the new one at the front. Most operations should be in O(1) which is better than a array-based implementation.
Everyone enjoys rolling their own container classes.
But in the .NET BCL there is a little gem called SortedList<T>. You can use this to implement your MRU list or any other priority-queue type list. It uses an efficient tree structure for efficient additions.
From SortedList on MSDN:
The elements of a SortedList object
are sorted by the keys either
according to a specific IComparer
implementation specified when the
SortedList is created or according to
the IComparable implementation
provided by the keys themselves. In
either case, a SortedList does not
allow duplicate keys.
The index sequence is based on the
sort sequence. When an element is
added, it is inserted into SortedList
in the correct sort order, and the
indexing adjusts accordingly. When an
element is removed, the indexing also
adjusts accordingly. Therefore, the
index of a specific key/value pair
might change as elements are added or
removed from the SortedList object.
Operations on a SortedList object tend
to be slower than operations on a
Hashtable object because of the
sorting. However, the SortedList
offers more flexibility by allowing
access to the values either through
the associated keys or through the
indexes.
Elements in this collection can be
accessed using an integer index.
Indexes in this collection are
zero-based.
In Java, I'd use the LinkedHashMap, which is built for this sort of thing.
public class MRUList<E> implements Iterable<E> {
private final LinkedHashMap<E, Void> backing;
public MRUList() {
this(10);
}
public MRUList(final int maxSize) {
this.backing = new LinkedHashMap<E,Void>(maxSize, maxSize, true){
private final int MAX_SIZE = maxSize;
#Override
protected boolean removeEldestEntry(Map.Entry<E,Void> eldest){
return size() > MAX_SIZE;
}
};
}
public void store(E item) {
backing.put(item, null);
}
public void clear() {
backing.clear();
}
public void storeRange(E[] range) {
for (E e : range) {
backing.put(e, null);
}
}
public List<E> getList() {
return new ArrayList<E>(backing.keySet());
}
public Iterator<E> iterator() {
return backing.keySet().iterator();
}
}
However, this does iterate in exactly reverse order (i.e. LRU first, MRU last). Making it MRU-first would require basically reimplementing LinkedHashMap but inserting new elements at the front of the backing list, instead of at the end.
Java 6 added a new Collection type named Deque... for Double-ended Queue.
There's one in particular that can be given a limited capacity: LinkedBlockingDeque.
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.LinkedBlockingDeque;
public class DequeMruList<T> implements IMruList<T> {
private LinkedBlockingDeque<T> store;
public DequeMruList(int capacity) {
store = new LinkedBlockingDeque<T>(capacity);
}
#Override
public void Clear() {
store.clear();
}
#Override
public List<T> GetList() {
return new ArrayList<T>(store);
}
#Override
public T GetNext() {
// Get the item, but don't remove it
return store.peek();
}
#Override
public T Store(T item) {
boolean stored = false;
// Keep looping until the item is added
while (!stored) {
// Add if there's room
if (store.offerFirst(item)) {
stored = true;
} else {
// No room, remove the last item
store.removeLast();
}
}
return item;
}
#Override
public void StoreRange(T[] range) {
for (T item : range) {
Store(item);
}
}
}

Categories

Resources