A neater way to use Function delegate over Action delegate

A neater way to use Function delegate over Action delegate - c#

I wish to iterate over a collection in a thread safe manner. I find it's convieniant to have one method called
/// <summary>
/// This visits all items and performs an action on them in a thread manner
/// </summary>
/// <param name="visitAction">The action to perform on the item</param>
public void VisitAllItems(Action<Item> visitAction) {
lock (listLock) {
foreach (Item item in this.ItemList) {
visitAction.Invoke(item);
}
}
}
An example of it's use could be
/// <summary>
/// Saves each item in the list to the database
/// </summary>
protected static void SaveListToDatabase() {
this.VisitAllItems(item => {
bool itemSavedSuccessfully = item.SaveToDB();
if(!itemSavedSuccessfully) {
//log error message
}
});
}
Another example of it's use would be
/// <summary>
/// Get the number of special items in the list.
/// </summary>
protected int GetNumberOfUnsynchronisedItems() {
int numberOfSpecialItems = 0;
this.VisitAllItems((item) => {
numberOfSpecialItems += item.IsItemSpecial() ? 1 : 0;
});
return numberOfSpecialItems;
}
But I'm sure there is a better way of writing the VisitAllItems method to use a Func<> rather than an Action<> delegate to return this value. I've tried several things but end up with compilation errors.
Can anyone see a neater way to implement this method?
Thanks,
Alasdair

It all depends on the usage, so it's hard to tell what exactly would be useful for you. But one way would be to return one result for each item in the collection, and then use LINQ to combine the results.
public IEnumerable<TResult> VisitAllItems<TResult>(
Func<Item, Result> visitfunction)
{
var result = new List<TResult>();
lock (listLock)
{
foreach (Item item in ItemList)
result.Add(visitfunction(item));
}
return result;
}
Normally, I would use yield return instead of manually creating the List<T>, but I think it's better this way here, because of the lock.
The usage would be like this:
protected int GetNumberOfUnsynchronisedItems()
{
return VisitAllItems(item => item.IsItemSpecial())
.Count(isSpecial => isSpecial);
}

You could follow the Aggregate model, but it wouldn't be terribly pleasant to use:
public TAccumulate VisitAllItems<TAccumulate>(TAccumulate seed,
Func<Item, TAccumulate, TAccumulate> visitor) {
TAccumulate current = seed;
lock (listLock) {
foreach (Item item in this.ItemList) {
current = visitor(current, item);
}
}
return current;
}
...
protected int GetNumberOfUnsynchronisedItems() {
return VisitAllItems(0,
(count, item) => count + item.IsItemSpecial() ? 1 : 0);
}
How many different aggregation functions do you actually need? For example, if most of the time you're just counting, you might want:
public void VisitAndCount<TAccumulate>(Func<Item, bool> visitor) {
int count = 0;
lock (listLock) {
foreach (Item item in this.ItemList) {
if (visitor(item)) {
count++;
}
}
}
return count;
}
then:
protected int GetNumberOfUnsynchronisedItems() {
return VisitAndCount(item => item.IsItemSpecial());
}

Related

In C#, is there a queue which can only hold an object once in its lifetime?

I need a datastructure, which is a special type of queue. I want that, if an instance of my queue ever contained an object X, it shouldn't be possible to enqueue X again in this instance. The enqueuing method should just do nothing if called with X, like the attempt to add a duplicate value to a HashSet.
Example usage:
MyQueue<int> queue = new MyQueue<int>();
queue.Enqueue(5);
queue.Enqueue(17);
queue.Enqueue(28);
queue.Enqueue(17);
int firstNumber = queue.Dequeue();
queue.Enqueue(5);
queue.Enqueue(3);
List<int> queueContents = queue.ToList(); //this list should contain {17, 28, 3}
I looked around on MSDN, but couldn't find such a class. Does it exist, or do I have to implement it myself?
I guess I could use a different data structure too, but access will always be FIFO, so I thought a queue will be most efficient. Also, I don't know of any other structure which provides such "uniqueness over instance lifetime" feature.

I would do something similar to this:
class UniqueQueue<T>
{
private readonly Queue<T> queue = new Queue<T>();
private HashSet<T> alreadyAdded = new HashSet<T>();
public virtual void Enqueue(T item)
{
if (alreadyAdded.Add(item)) { queue.Enqueue(item); }
}
public int Count { get { return queue.Count; } }
public virtual T Dequeue()
{
T item = queue.Dequeue();
return item;
}
}
Note, most of this code was borrowed from This Thread.

You'd have to implement that yourself.
One idea is just to add the element to a HashSet when you enqueue it.
Then, when you want to enqueue, just check the HashSet for the item, if it exists, don't enqueue.
Since you want to prevent enqueuing for the rest of the queue's lifetime, you probably won't want to ever remove from the HashSet.

This is just a extended version of wayne's answer, it is just a little more fleshed out and having a few more interfaces supported. (To mimic Queue<T>'s interfaces)
sealed class UniqueQueue<T> : IEnumerable<T>, ICollection, IEnumerable
{
private readonly Queue<T> queue;
private readonly HashSet<T> alreadyAdded;
public UniqueQueue(IEqualityComparer<T> comparer)
{
queue = new Queue<T>();
alreadyAdded = new HashSet<T>(comparer);
}
public UniqueQueue(IEnumerable<T> collection, IEqualityComparer<T> comparer)
{
//Do this so the enumeration does not happen twice in case the enumerator behaves differently each enumeration.
var localCopy = collection.ToList();
queue = new Queue<T>(localCopy);
alreadyAdded = new HashSet<T>(localCopy, comparer);
}
public UniqueQueue(int capacity, IEqualityComparer<T> comparer)
{
queue = new Queue<T>(capacity);
alreadyAdded = new HashSet<T>(comparer);
}
//Here are the constructors that use the default comparer. By passing null in for the comparer it will just use the default one for the type.
public UniqueQueue() : this((IEqualityComparer<T>) null) { }
public UniqueQueue(IEnumerable<T> collection) : this(collection, null) { }
public UniqueQueue(int capacity) : this(capacity, null) { }
/// <summary>
/// Attempts to enqueue a object, returns false if the object was ever added to the queue in the past.
/// </summary>
/// <param name="item">The item to enqueue</param>
/// <returns>True if the object was successfully added, false if it was not</returns>
public bool Enqueue(T item)
{
if (!alreadyAdded.Add(item))
return false;
queue.Enqueue(item);
return true;
}
public int Count
{
get { return queue.Count; }
}
public T Dequeue()
{
return queue.Dequeue();
}
IEnumerator<T> IEnumerable<T>.GetEnumerator()
{
return ((IEnumerable<T>)queue).GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return ((IEnumerable)queue).GetEnumerator();
}
void ICollection.CopyTo(Array array, int index)
{
((ICollection)queue).CopyTo(array, index);
}
bool ICollection.IsSynchronized
{
get { return ((ICollection)queue).IsSynchronized; }
}
object ICollection.SyncRoot
{
get { return ((ICollection)queue).SyncRoot; }
}
}

You can use a basic queue, but modify the Enqueue method to verify the previous values entered. Here, I used a hashset to contain those previous values :
public class UniqueValueQueue<T> : Queue<T>
{
private readonly HashSet<T> pastValues = new HashSet<T>();
public new void Enqueue(T item)
{
if (!pastValues.Contains(item))
{
pastValues.Add(item);
base.Enqueue(item);
}
}
}
With your test case
UniqueValueQueue<int> queue = new UniqueValueQueue<int>();
queue.Enqueue(5);
queue.Enqueue(17);
queue.Enqueue(28);
queue.Enqueue(17);
int firstNumber = queue.Dequeue();
queue.Enqueue(5);
queue.Enqueue(3);
List<int> queueContents = queue.ToList();
queueContents contains 17, 28 and 3.

Linq optimisation within a foreach

I have been looking for a way of splitting a foreach loop into multiple parts and came across the following code:
foreach(var item in items.Skip(currentPage * itemsPerPage).Take(itemsPerPage))
{
//Do stuff
}
Would items.Skip(currentPage * itemsPerPage).Take(itemsPerPage) be processed in every iteration, or would it be processed once, and have a temporary result used with the foreach loop automatically by the compiler?

No, it would be processed once.
It's the same like:
public IEnumerable<Something> GetData() {
return someData;
}
foreach(var d in GetData()) {
//do something with [d]
}

The foreach construction is equivalent to:
IEnumerator enumerator = myCollection.GetEnumerator();
try
{
while (enumerator.MoveNext())
{
object current = enumerator.Current;
Console.WriteLine(current);
}
}
finally
{
IDisposable e = enumerator as IDisposable;
if (e != null)
{
e.Dispose();
}
}
So, no, myCollection would be processed only once.
Update:
Please note that this depends on the implementation of the IEnumerator that the IEnumerable uses.
In this (evil) example:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Collections;
namespace TestStack
{
class EvilEnumerator<T> : IEnumerator<T> {
private IEnumerable<T> enumerable;
private int index = -1;
public EvilEnumerator(IEnumerable<T> e)
{
enumerable = e;
}
#region IEnumerator<T> Membres
public T Current
{
get { return enumerable.ElementAt(index); }
}
#endregion
#region IDisposable Membres
public void Dispose()
{
}
#endregion
#region IEnumerator Membres
object IEnumerator.Current
{
get { return enumerable.ElementAt(index); }
}
public bool MoveNext()
{
index++;
if (index >= enumerable.Count())
return false;
return true;
}
public void Reset()
{
}
#endregion
}
class DemoEnumerable<T> : IEnumerable<T>
{
private IEnumerable<T> enumerable;
public DemoEnumerable(IEnumerable<T> e)
{
enumerable = e;
}
#region IEnumerable<T> Membres
public IEnumerator<T> GetEnumerator()
{
return new EvilEnumerator<T>(enumerable);
}
#endregion
#region IEnumerable Membres
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
#endregion
}
class Program
{
static void Main(string[] args)
{
IEnumerable<int> numbers = Enumerable.Range(0,100);
DemoEnumerable<int> enumerable = new DemoEnumerable<int>(numbers);
foreach (var item in enumerable)
{
Console.WriteLine(item);
}
}
}
}
Each iteration over enumerable would evaluate numbers two times.

Question:
Would items.Skip(currentPage * itemsPerPage).Take(itemsPerPage) be
processed every iteration, or would it be processed once, and have a
temporary result used with the foreach loop automatically by the
compiler?
Answer:
It would be processed once, not every iteration. You can put the collection into a variable to make the foreach more readable. Illustrated below.
foreach(var item in items.Skip(currentPage * itemsPerPage).Take(itemsPerPage))
{
//Do stuff
}
vs.
List<MyClass> query = items.Skip(currentPage * itemsPerPage).Take(itemsPerPage).ToList();
foreach(var item in query)
{
//Do stuff
}
vs.
IEnumerable<MyClass> query = items.Skip(currentPage * itemsPerPage).Take(itemsPerPage);
foreach(var item in query)
{
//Do stuff
}

The code that you present will only iterate the items in the list once, as others have pointed out.
However, that only gives you the items for one page. If you are handling multiple pages, you must be calling that code once for each page (because somewhere you must be incrementing currentPage, right?).
What I mean is that you must be doing something like this:
for (int currentPage = 0; currentPage < numPages; ++currentPage)
{
foreach (var item in items.Skip(currentPage*itemsPerPage).Take(itemsPerPage))
{
//Do stuff
}
}
Now if you do that, then you will be iterating the sequence multiple times - once for each page. The first iteration will only go as far as the end of the first page, but the next will iterate from the beginning to the end of the second page (via the Skip() and the Take()) - and the next will iterate from the beginning to the end of the third page. And so on.
To avoid that you can write an extension method for IEnumerable<T> which partitions the data into batches (which you could also describe as "paginating" the data into "pages").
Rather than just presenting an IEnumerable of IEnumerables, it can be more useful to wrap each batch in a class to supply the batch index along with the items in the batch, like so:
public sealed class Batch<T>
{
public readonly int Index;
public readonly IEnumerable<T> Items;
public Batch(int index, IEnumerable<T> items)
{
Index = index;
Items = items;
}
}
public static class EnumerableExt
{
// Note: Not threadsafe, so not suitable for use with Parallel.Foreach() or IEnumerable.AsParallel()
public static IEnumerable<Batch<T>> Partition<T>(this IEnumerable<T> input, int batchSize)
{
var enumerator = input.GetEnumerator();
int index = 0;
while (enumerator.MoveNext())
yield return new Batch<T>(index++, nextBatch(enumerator, batchSize));
}
private static IEnumerable<T> nextBatch<T>(IEnumerator<T> enumerator, int blockSize)
{
do { yield return enumerator.Current; }
while (--blockSize > 0 && enumerator.MoveNext());
}
}
This extension method does not buffer the data, and it only iterates through it once.
Given this extension method, it becomes more readable to batch up the items. Note that this example enumerates through ALL items for all pages, unlike the OP's example which only iterates through the items for one page:
var items = Enumerable.Range(10, 50); // Pretend we have 50 items.
int itemsPerPage = 20;
foreach (var page in items.Partition(itemsPerPage))
{
Console.Write("Page " + page.Index + " items: ");
foreach (var i in page.Items)
Console.Write(i + " ");
Console.WriteLine();
}

Prioritized queues in Task Parallel Library

Is there any prior work of adding tasks to the TPL runtime with a varying priority?
If not, generally speaking, how would I implement this?
Ideally I plan on using the producer-consumer pattern to add "todo" work to the TPL. There may be times where I discover that a low priority job needs to be upgraded to a high priority job (relative to the others).
If anyone has some search keywords I should use when searching for this, please mention them, since I haven't yet found code that will do what I need.

So here is a rather naive concurrent implementation around a rather naive priority queue. The idea here is that there is a sorted set that holds onto pairs of both the real item and a priority, but is given a comparer that just compares the priority. The constructor takes a function that computes the priority for a given object.
As for actual implementation, they're not efficiently implemented, I just lock around everything. Creating more efficient implementations would prevent the use of SortedSet as a priority queue, and re-implementing one of those that can be effectively accessed concurrently is not going to be that easy.
In order to change the priority of an item you'll need to remove the item from the set and then add it again, and to find it without iterating the whole set you'd need to know the old priority as well as the new priority.
public class ConcurrentPriorityQueue<T> : IProducerConsumerCollection<T>
{
private object key = new object();
private SortedSet<Tuple<T, int>> set;
private Func<T, int> prioritySelector;
public ConcurrentPriorityQueue(Func<T, int> prioritySelector, IComparer<T> comparer = null)
{
this.prioritySelector = prioritySelector;
set = new SortedSet<Tuple<T, int>>(
new MyComparer<T>(comparer ?? Comparer<T>.Default));
}
private class MyComparer<T> : IComparer<Tuple<T, int>>
{
private IComparer<T> comparer;
public MyComparer(IComparer<T> comparer)
{
this.comparer = comparer;
}
public int Compare(Tuple<T, int> first, Tuple<T, int> second)
{
var returnValue = first.Item2.CompareTo(second.Item2);
if (returnValue == 0)
returnValue = comparer.Compare(first.Item1, second.Item1);
return returnValue;
}
}
public bool TryAdd(T item)
{
lock (key)
{
return set.Add(Tuple.Create(item, prioritySelector(item)));
}
}
public bool TryTake(out T item)
{
lock (key)
{
if (set.Count > 0)
{
var first = set.First();
item = first.Item1;
return set.Remove(first);
}
else
{
item = default(T);
return false;
}
}
}
public bool ChangePriority(T item, int oldPriority, int newPriority)
{
lock (key)
{
if (set.Remove(Tuple.Create(item, oldPriority)))
{
return set.Add(Tuple.Create(item, newPriority));
}
else
return false;
}
}
public bool ChangePriority(T item)
{
lock (key)
{
var result = set.FirstOrDefault(pair => object.Equals(pair.Item1, item));
if (object.Equals(result.Item1, item))
{
return ChangePriority(item, result.Item2, prioritySelector(item));
}
else
{
return false;
}
}
}
public void CopyTo(T[] array, int index)
{
lock (key)
{
foreach (var item in set.Select(pair => pair.Item1))
{
array[index++] = item;
}
}
}
public T[] ToArray()
{
lock (key)
{
return set.Select(pair => pair.Item1).ToArray();
}
}
public IEnumerator<T> GetEnumerator()
{
return ToArray().AsEnumerable().GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
public void CopyTo(Array array, int index)
{
lock (key)
{
foreach (var item in set.Select(pair => pair.Item1))
{
array.SetValue(item, index++);
}
}
}
public int Count
{
get { lock (key) { return set.Count; } }
}
public bool IsSynchronized
{
get { return true; }
}
public object SyncRoot
{
get { return key; }
}
}
Once you have an IProducerConsumerCollection<T> instance, which the above object is, you can use it as the internal backing object of a BlockingCollection<T> in order to have an easier to use user interface.

ParallelExtensionsExtras contains several custom TaskSchedulers that could be helpful either directly or as a base for your own scheduler.
Specifically, there are two schedulers that may be interesting for you:
QueuedTaskScheduler, which allows you to schedule Tasks at different priorities, but doesn't allow changing the priority of enqueued Tasks.
ReprioritizableTaskScheduler, which doesn't have different priorities, but allows you to move a specific Task to the front or to the back of the queue. (Though changing priority is O(n) in the number of currently waiting Tasks, which could be a problem if you had many Tasks at the same time.)

Can I cache partially-executed LINQ queries?

I have the following code:
IEnumerable<KeyValuePair<T, double>> items =
sequence.Select(item => new KeyValuePair<T, double>(item, weight(item)));
if (items.Any(pair => pair.Value<0))
throw new ArgumentException("Item weights cannot be less than zero.");
double sum = items.Sum(pair => pair.Value);
foreach (KeyValuePair<T, double> pair in items) {...}
Where weight is a Func<T, double>.
The problem is I want weight to be executed as few times as possible. This means it should be executed at most once for each item. I could achieve this by saving it to an array. However, if any weight returns a negative value, I don't want to continue execution.
Is there any way to accomplish this easily within the LINQ framework?

Sure, that's totally doable:
public static Func<A, double> ThrowIfNegative<A, double>(this Func<A, double> f)
{
return a=>
{
double r = f(a);
// if r is NaN then this will throw.
if ( !(r >= 0.0) )
throw new Exception();
return r;
};
}
public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
var d = new Dictionary<A, R>();
return a=>
{
R r;
if (!d.TryGetValue(a, out r))
{
r = f(a);
d.Add(a, r);
}
return r;
};
}
And now...
Func<T, double> weight = whatever;
weight = weight.ThrowIfNegative().Memoize();
and you're done.

One way is to move the exception into the weight function, or at least simulate doing so, by doing something like:
Func<T, double> weightWithCheck = i =>
{
double result = weight(i);
if (result < 0)
{
throw new ArgumentException("Item weights cannot be less than zero.");
}
return result;
};
IEnumerable<KeyValuePair<T, double>> items =
sequence.Select(item => new KeyValuePair<T, double>(item, weightWithCheck(item)));
double sum = items.Sum(pair => pair.Value);
By this point, if there is an exception to be had, you should have it. You do have to enumerate items before you can be assured of getting the exception, though, but once you get it, you will not call weight again.

Both answers are good (where to throw the exception, and memoizing the function).
But your real problem is that your LINQ expression is evaluated every time you use it, unless you force it to evaluate and store as a List (or similar). Just change this:
sequence.Select(item => new KeyValuePair<T, double>(item, weight(item)));
To this:
sequence.Select(item => new KeyValuePair<T, double>(item, weight(item))).ToList();

You could possibly do it with a foreach loop. Here is a way to do it in one statement:
IEnumerable<KeyValuePair<T, double>> items = sequence
.Select(item => new KeyValuePair<T, double>(item, weight(item)))
.Select(kvp =>
{
if (kvp.Value < 0)
throw new ArgumentException("Item weights cannot be less than zero.");
else
return kvp;
}
);

No, there is nothing already IN the LINQ framework to do this, but you could surely write up your own methods and invoke them from the linq query (As has already been shown by many).
Personally, I would either ToList the first query or use Eric's suggestion.

Instead of a functional memoization suggested by other answers, you can also employ a memoization for the whole data sequence:
var items = sequence
.Select(item => new KeyValuePair<T, double>(item, weight(item)))
.Memoize();
(Note a call to Memoize() method at the end of the expression above)
A nice property of data memoization is that it represents a drop-in replacement for ToList() or ToArray() approaches.
The fully featured implementation is pretty involved though:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
static class MemoizationExtensions
{
/// <summary>
/// Memoize all elements of a sequence, e.g. ensure that every element of a sequence is retrieved only once.
/// </summary>
/// <remarks>
/// The resulting sequence is not thread safe.
/// </remarks>
/// <typeparam name="T">The type of the elements of source.</typeparam>
/// <param name="source">The source sequence.</param>
/// <returns>The sequence that fully replicates the source with all elements being memoized.</returns>
public static IEnumerable<T> Memoize<T>(this IEnumerable<T> source) => Memoize(source, false);
/// <summary>
/// Memoize all elements of a sequence, e.g. ensure that every element of a sequence is retrieved only once.
/// </summary>
/// <typeparam name="T">The type of the elements of source.</typeparam>
/// <param name="source">The source sequence.</param>
/// <param name="isThreadSafe">Indicates whether resulting sequence is thread safe.</param>
/// <returns>The sequence that fully replicates the source with all elements being memoized.</returns>
public static IEnumerable<T> Memoize<T>(this IEnumerable<T> source, bool isThreadSafe)
{
switch (source)
{
case null:
return null;
case CachedEnumerable<T> existingCachedEnumerable:
if (!isThreadSafe || existingCachedEnumerable is ThreadSafeCachedEnumerable<T>)
{
// The source is already memoized with compatible parameters.
return existingCachedEnumerable;
}
break;
case IList<T> _:
case IReadOnlyList<T> _:
case string _:
// Given source types are intrinsically memoized by their nature.
return source;
}
if (isThreadSafe)
return new ThreadSafeCachedEnumerable<T>(source);
else
return new CachedEnumerable<T>(source);
}
class CachedEnumerable<T> : IEnumerable<T>, IReadOnlyList<T>
{
public CachedEnumerable(IEnumerable<T> source)
{
_Source = source;
}
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
IEnumerable<T> _Source;
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
IEnumerator<T> _SourceEnumerator;
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
protected readonly IList<T> Cache = new List<T>();
public virtual int Count
{
get
{
while (_TryCacheElementNoLock()) ;
return Cache.Count;
}
}
bool _TryCacheElementNoLock()
{
if (_SourceEnumerator == null && _Source != null)
{
_SourceEnumerator = _Source.GetEnumerator();
_Source = null;
}
if (_SourceEnumerator == null)
{
// Source enumerator already reached the end.
return false;
}
else if (_SourceEnumerator.MoveNext())
{
Cache.Add(_SourceEnumerator.Current);
return true;
}
else
{
// Source enumerator has reached the end, so it is no longer needed.
_SourceEnumerator.Dispose();
_SourceEnumerator = null;
return false;
}
}
public virtual T this[int index]
{
get
{
_EnsureItemIsCachedNoLock(index);
return Cache[index];
}
}
public IEnumerator<T> GetEnumerator() => new CachedEnumerator<T>(this);
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
internal virtual bool EnsureItemIsCached(int index) => _EnsureItemIsCachedNoLock(index);
bool _EnsureItemIsCachedNoLock(int index)
{
while (Cache.Count <= index)
{
if (!_TryCacheElementNoLock())
return false;
}
return true;
}
internal virtual T GetCacheItem(int index) => Cache[index];
}
sealed class ThreadSafeCachedEnumerable<T> : CachedEnumerable<T>
{
public ThreadSafeCachedEnumerable(IEnumerable<T> source) :
base(source)
{
}
public override int Count
{
get
{
lock (Cache)
return base.Count;
}
}
public override T this[int index]
{
get
{
lock (Cache)
return base[index];
}
}
internal override bool EnsureItemIsCached(int index)
{
lock (Cache)
return base.EnsureItemIsCached(index);
}
internal override T GetCacheItem(int index)
{
lock (Cache)
return base.GetCacheItem(index);
}
}
sealed class CachedEnumerator<T> : IEnumerator<T>
{
CachedEnumerable<T> _CachedEnumerable;
const int InitialIndex = -1;
const int EofIndex = -2;
int _Index = InitialIndex;
public CachedEnumerator(CachedEnumerable<T> cachedEnumerable)
{
_CachedEnumerable = cachedEnumerable;
}
public T Current
{
get
{
var cachedEnumerable = _CachedEnumerable;
if (cachedEnumerable == null)
throw new InvalidOperationException();
var index = _Index;
if (index < 0)
throw new InvalidOperationException();
return cachedEnumerable.GetCacheItem(index);
}
}
object IEnumerator.Current => Current;
public void Dispose()
{
_CachedEnumerable = null;
}
public bool MoveNext()
{
var cachedEnumerable = _CachedEnumerable;
if (cachedEnumerable == null)
{
// Disposed.
return false;
}
if (_Index == EofIndex)
return false;
_Index++;
if (!cachedEnumerable.EnsureItemIsCached(_Index))
{
_Index = EofIndex;
return false;
}
else
{
return true;
}
}
public void Reset()
{
_Index = InitialIndex;
}
}
}
More info and a readily available NuGet package: https://github.com/gapotchenko/Gapotchenko.FX/tree/master/Source/Gapotchenko.FX.Linq#memoize

Caching IEnumerable

public IEnumerable<ModuleData> ListModules()
{
foreach (XElement m in Source.Descendants("Module"))
{
yield return new ModuleData(m.Element("ModuleID").Value);
}
}
Initially the above code is great since there is no need to evaluate the entire collection if it is not needed.
However, once all the Modules have been enumerated once, it becomes more expensive to repeatedly query the XDocument when there is no change.
So, as a performance improvement:
public IEnumerable<ModuleData> ListModules()
{
if (Modules == null)
{
Modules = new List<ModuleData>();
foreach (XElement m in Source.Descendants("Module"))
{
Modules.Add(new ModuleData(m.Element("ModuleID").Value, 1, 1));
}
}
return Modules;
}
Which is great if I am repeatedly using the entire list but not so great otherwise.
Is there a middle ground where I can yield return until the entire list has been iterated, then cache it and serve the cache to subsequent requests?

You can look at Saving the State of Enumerators which describes how to create lazy list (which caches once iterated items).

Check out MemoizeAll() in the Reactive Extensions for .NET library (Rx). As it is evaluated lazily you can safely set it up during construction and just return Modules from ListModules():
Modules = Source.
Descendants("Module").
Select(m => new ModuleData(m.Element("ModuleID").Value, 1, 1)).
MemoizeAll();
There's a good explanation of MemoizeAll() (and some of the other less obvious Rx extensions) here.

I like #tsemer's answer. But I would like to propose my solutions, which has nothing to do with FP. It's naive approach, but it generates a lot less of allocations. And it is not thread safe.
public class CachedEnumerable<T> : IEnumerable<T>, IDisposable
{
IEnumerator<T> _enumerator;
readonly List<T> _cache = new List<T>();
public CachedEnumerable(IEnumerable<T> enumerable)
: this(enumerable.GetEnumerator())
{
}
public CachedEnumerable(IEnumerator<T> enumerator)
{
_enumerator = enumerator;
}
public IEnumerator<T> GetEnumerator()
{
// The index of the current item in the cache.
int index = 0;
// Enumerate the _cache first
for (; index < _cache.Count; index++)
{
yield return _cache[index];
}
// Continue enumeration of the original _enumerator,
// until it is finished.
// This adds items to the cache and increment
for (; _enumerator != null && _enumerator.MoveNext(); index++)
{
var current = _enumerator.Current;
_cache.Add(current);
yield return current;
}
if (_enumerator != null)
{
_enumerator.Dispose();
_enumerator = null;
}
// Some other users of the same instance of CachedEnumerable
// can add more items to the cache,
// so we need to enumerate them as well
for (; index < _cache.Count; index++)
{
yield return _cache[index];
}
}
public void Dispose()
{
if (_enumerator != null)
{
_enumerator.Dispose();
_enumerator = null;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
This is how the matrix test from #tsemer's answer will work:
var ints = new [] { 1, 2, 3, 4, 5 };
var cachedEnumerable = new CachedEnumerable<int>(ints);
foreach (var x in cachedEnumerable)
{
foreach (var y in cachedEnumerable)
{
//Do something
}
}
The outer loop (x) skips first for, because _cache is empty;
x fetches one item from the _enumerator to the _cache;
x pauses before second for loop;
The inner loop (y) enumerates one element from the _cache;
y fetches all elements from the _enumerator to the _cache;
y skips the third for loop, because its index variable equals 5;
x resumes, its index equals 1. It skips the second for loop because _enumerator is finished;
x enumerates one element from the _cache using the third for loop;
x pauses before the third for;
y enumerates 5 elements from the _cache using first for loop;
y skips the second for loop, because _enumerator is finished;
y skips the third for loop, because index of y equals 5;
x resumes, increments index. It fetches one element from the _cache using the third for loop.
x pauses.
if index variable of x is less than 5 then go to 10;
end.

I've seen a handful of implementations out there, some older and not taking advantage of newest .Net classes, some too elaborate for my needs. I ended up with the most concise and declarative code I could muster, which added up to a class with roughly 15 lines of (actual) code. It seems to align well with OP's needs:
Edit: Second revision, better support for empty enumerables
/// <summary>
/// A <see cref="IEnumerable{T}"/> that caches every item upon first enumeration.
/// </summary>
/// <seealso cref="http://blogs.msdn.com/b/matt/archive/2008/03/14/digging-deeper-into-lazy-and-functional-c.aspx"/>
/// <seealso cref="http://blogs.msdn.com/b/wesdyer/archive/2007/02/13/the-virtues-of-laziness.aspx"/>
public class CachedEnumerable<T> : IEnumerable<T> {
private readonly bool _hasItem; // Needed so an empty enumerable will not return null but an actual empty enumerable.
private readonly T _item;
private readonly Lazy<CachedEnumerable<T>> _nextItems;
/// <summary>
/// Initialises a new instance of <see cref="CachedEnumerable{T}"/> using <paramref name="item"/> as the current item
/// and <paramref name="nextItems"/> as a value factory for the <see cref="CachedEnumerable{T}"/> containing the next items.
/// </summary>
protected internal CachedEnumerable(T item, Func<CachedEnumerable<T>> nextItems) {
_hasItem = true;
_item = item;
_nextItems = new Lazy<CachedEnumerable<T>>(nextItems);
}
/// <summary>
/// Initialises a new instance of <see cref="CachedEnumerable{T}"/> with no current item and no next items.
/// </summary>
protected internal CachedEnumerable() {
_hasItem = false;
}
/// <summary>
/// Instantiates and returns a <see cref="CachedEnumerable{T}"/> for a given <paramref name="enumerable"/>.
/// Notice: The first item is always iterated through.
/// </summary>
public static CachedEnumerable<T> Create(IEnumerable<T> enumerable) {
return Create(enumerable.GetEnumerator());
}
/// <summary>
/// Instantiates and returns a <see cref="CachedEnumerable{T}"/> for a given <paramref name="enumerator"/>.
/// Notice: The first item is always iterated through.
/// </summary>
private static CachedEnumerable<T> Create(IEnumerator<T> enumerator) {
return enumerator.MoveNext() ? new CachedEnumerable<T>(enumerator.Current, () => Create(enumerator)) : new CachedEnumerable<T>();
}
/// <summary>
/// Returns an enumerator that iterates through the collection.
/// </summary>
public IEnumerator<T> GetEnumerator() {
if (_hasItem) {
yield return _item;
var nextItems = _nextItems.Value;
if (nextItems != null) {
foreach (var nextItem in nextItems) {
yield return nextItem;
}
}
}
}
/// <summary>
/// Returns an enumerator that iterates through a collection.
/// </summary>
IEnumerator IEnumerable.GetEnumerator() {
return GetEnumerator();
}
}
A useful extension method could be:
public static class IEnumerableExtensions {
/// <summary>
/// Instantiates and returns a <see cref="CachedEnumerable{T}"/> for a given <paramref name="enumerable"/>.
/// Notice: The first item is always iterated through.
/// </summary>
public static CachedEnumerable<T> ToCachedEnumerable<T>(this IEnumerable<T> enumerable) {
return CachedEnumerable<T>.Create(enumerable);
}
}
And for the unit testers amongst you: (if you don't use resharper just take out the [SuppressMessage] attributes)
/// <summary>
/// Tests the <see cref="CachedEnumerable{T}"/> class.
/// </summary>
[TestFixture]
public class CachedEnumerableTest {
private int _count;
/// <remarks>
/// This test case is only here to emphasise the problem with <see cref="IEnumerable{T}"/> which <see cref="CachedEnumerable{T}"/> attempts to solve.
/// </remarks>
[Test]
[SuppressMessage("ReSharper", "PossibleMultipleEnumeration")]
[SuppressMessage("ReSharper", "ReturnValueOfPureMethodIsNotUsed")]
public void MultipleEnumerationAreNotCachedForOriginalIEnumerable() {
_count = 0;
var enumerable = Enumerable.Range(1, 40).Select(IncrementCount);
enumerable.Take(3).ToArray();
enumerable.Take(10).ToArray();
enumerable.Take(4).ToArray();
Assert.AreEqual(17, _count);
}
/// <remarks>
/// This test case is only here to emphasise the problem with <see cref="IList{T}"/> which <see cref="CachedEnumerable{T}"/> attempts to solve.
/// </remarks>
[Test]
[SuppressMessage("ReSharper", "PossibleMultipleEnumeration")]
[SuppressMessage("ReSharper", "ReturnValueOfPureMethodIsNotUsed")]
public void EntireListIsEnumeratedForOriginalListOrArray() {
_count = 0;
Enumerable.Range(1, 40).Select(IncrementCount).ToList();
Assert.AreEqual(40, _count);
_count = 0;
Enumerable.Range(1, 40).Select(IncrementCount).ToArray();
Assert.AreEqual(40, _count);
}
[Test]
[SuppressMessage("ReSharper", "ReturnValueOfPureMethodIsNotUsed")]
public void MultipleEnumerationsAreCached() {
_count = 0;
var cachedEnumerable = Enumerable.Range(1, 40).Select(IncrementCount).ToCachedEnumerable();
cachedEnumerable.Take(3).ToArray();
cachedEnumerable.Take(10).ToArray();
cachedEnumerable.Take(4).ToArray();
Assert.AreEqual(10, _count);
}
[Test]
public void FreshCachedEnumerableDoesNotEnumerateExceptFirstItem() {
_count = 0;
Enumerable.Range(1, 40).Select(IncrementCount).ToCachedEnumerable();
Assert.AreEqual(1, _count);
}
/// <remarks>
/// Based on Jon Skeet's test mentioned here: http://www.siepman.nl/blog/post/2013/10/09/LazyList-A-better-LINQ-result-cache-than-List.aspx
/// </remarks>
[Test]
[SuppressMessage("ReSharper", "LoopCanBeConvertedToQuery")]
public void MatrixEnumerationIteratesAsExpectedWhileStillKeepingEnumeratedValuesCached() {
_count = 0;
var cachedEnumerable = Enumerable.Range(1, 5).Select(IncrementCount).ToCachedEnumerable();
var matrixCount = 0;
foreach (var x in cachedEnumerable) {
foreach (var y in cachedEnumerable) {
matrixCount++;
}
}
Assert.AreEqual(5, _count);
Assert.AreEqual(25, matrixCount);
}
[Test]
public void OrderingCachedEnumerableWorksAsExpectedWhileStillKeepingEnumeratedValuesCached() {
_count = 0;
var cachedEnumerable = Enumerable.Range(1, 5).Select(IncrementCount).ToCachedEnumerable();
var orderedEnumerated = cachedEnumerable.OrderBy(x => x);
var orderedEnumeratedArray = orderedEnumerated.ToArray(); // Enumerated first time in ascending order.
Assert.AreEqual(5, _count);
for (int i = 0; i < orderedEnumeratedArray.Length; i++) {
Assert.AreEqual(i + 1, orderedEnumeratedArray[i]);
}
var reorderedEnumeratedArray = orderedEnumerated.OrderByDescending(x => x).ToArray(); // Enumerated second time in descending order.
Assert.AreEqual(5, _count);
for (int i = 0; i < reorderedEnumeratedArray.Length; i++) {
Assert.AreEqual(5 - i, reorderedEnumeratedArray[i]);
}
}
private int IncrementCount(int value) {
_count++;
return value;
}
}

I quite like hazzik's answer...nice and simple is always the way.
BUT there is a bug in GetEnumerator
it sort of realises there is a problem, and thats why there is a strange 3rd loop after the 2nd enumerator loop....but it isnt quite as simple as that. The problem that triggers the need for the 3rd loop is general...so it needs to be recursive.
The answer though looks even simpler.
public IEnumerator<T> GetEnumerator()
{
int index = 0;
while (true)
{
if (index < _cache.Count)
{
yield return _cache[index];
index = index + 1;
}
else
{
if (_enumerator.MoveNext())
{
_cache.Add(_enumerator.Current);
}
else
{
yield break;
}
}
}
}
yes you can make it a tiny bit more efficient by yielding current...but I'll take the microsecond hit...it only ever happens once per element.
and its not threadsafe...but who cares about that.

Just to sum up things a bit:
In this answer a solution is presented, complete with an extension method for easy use and unit tests. However, as it uses recursion, performance can be expected to be worse than the other non recursive solution due to fewer allocations.
In this answer a non recursive solution is presented, including some code to account for the case where the enumerable is being enumerated twice. In this situation, however, it might not maintain the order of the original enumerable and it does not scale to more than two concurrent enumerations.
In this answer the enumerator method is rewritten to generalize the solution for the case of multiple concurrent enumeration while preserving the order of the original enumerable.
Combining the code from all answers we get the following class. Beware that this code is not thread safe, meaning that concurrent enumeration is safe only from the same thread.
public class CachedEnumerable<T> : IEnumerable<T>, IDisposable
{
private readonly IEnumerator<T> enumerator;
private readonly List<T> cache = new List<T>();
public CachedEnumerable(IEnumerable<T> enumerable) : this(enumerable.GetEnumerator()) { }
public CachedEnumerable(IEnumerator<T> enumerator)
=> this.enumerator = enumerator ?? throw new ArgumentNullException(nameof(enumerator));
public IEnumerator<T> GetEnumerator()
{
int index = 0;
while (true) {
if (index < cache.Count) {
yield return cache[index];
index++;
}
else if (enumerator.MoveNext())
cache.Add(enumerator.Current);
else
yield break;
}
}
public void Dispose() => enumerator.Dispose();
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}
With the static extension method for easy use:
public static class EnumerableUtils
{
public static CachedEnumerable<T> ToCachedEnumerable<T>(this IEnumerable<T> enumerable)
=> new CachedEnumerable<T>(enumerable);
}
And the corresponding unit tests:
public class CachedEnumerableTest
{
private int _count;
[Test]
public void MultipleEnumerationAreNotCachedForOriginalIEnumerable()
{
_count = 0;
var enumerable = Enumerable.Range(1, 40).Select(incrementCount);
enumerable.Take(3).ToArray();
enumerable.Take(10).ToArray();
enumerable.Take(4).ToArray();
Assert.AreEqual(17, _count);
}
[Test]
public void EntireListIsEnumeratedForOriginalListOrArray()
{
_count = 0;
Enumerable.Range(1, 40).Select(incrementCount).ToList();
Assert.AreEqual(40, _count);
_count = 0;
Enumerable.Range(1, 40).Select(incrementCount).ToArray();
Assert.AreEqual(40, _count);
}
[Test]
public void MultipleEnumerationsAreCached()
{
_count = 0;
var cachedEnumerable = Enumerable.Range(1, 40).Select(incrementCount).ToCachedEnumerable();
cachedEnumerable.Take(3).ToArray();
cachedEnumerable.Take(10).ToArray();
cachedEnumerable.Take(4).ToArray();
Assert.AreEqual(10, _count);
}
[Test]
public void FreshCachedEnumerableDoesNotEnumerateExceptFirstItem()
{
_count = 0;
Enumerable.Range(1, 40).Select(incrementCount).ToCachedEnumerable();
Assert.That(_count <= 1);
}
[Test]
public void MatrixEnumerationIteratesAsExpectedWhileStillKeepingEnumeratedValuesCached()
{
_count = 0;
var cachedEnumerable = Enumerable.Range(1, 5).Select(incrementCount).ToCachedEnumerable();
var matrixCount = 0;
foreach (var x in cachedEnumerable) {
foreach (var y in cachedEnumerable) {
matrixCount++;
}
}
Assert.AreEqual(5, _count);
Assert.AreEqual(25, matrixCount);
}
[Test]
public void OrderingCachedEnumerableWorksAsExpectedWhileStillKeepingEnumeratedValuesCached()
{
_count = 0;
var cachedEnumerable = Enumerable.Range(1, 5).Select(incrementCount).ToCachedEnumerable();
var orderedEnumerated = cachedEnumerable.OrderBy(x => x);
var orderedEnumeratedArray = orderedEnumerated.ToArray(); // Enumerated first time in ascending order.
Assert.AreEqual(5, _count);
for (int i = 0; i < orderedEnumeratedArray.Length; i++) {
Assert.AreEqual(i + 1, orderedEnumeratedArray[i]);
}
var reorderedEnumeratedArray = orderedEnumerated.OrderByDescending(x => x).ToArray(); // Enumerated second time in descending order.
Assert.AreEqual(5, _count);
for (int i = 0; i < reorderedEnumeratedArray.Length; i++) {
Assert.AreEqual(5 - i, reorderedEnumeratedArray[i]);
}
}
private int incrementCount(int value)
{
_count++;
return value;
}
}

I don't see any serious problem with the idea to cache results in a list, just like in the above code. Probably, it would be better to construct the list using ToList() method.
public IEnumerable<ModuleData> ListModules()
{
if (Modules == null)
{
Modules = Source.Descendants("Module")
.Select(m => new ModuleData(m.Element("ModuleID").Value, 1, 1)))
.ToList();
}
return Modules;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

A neater way to use Function delegate over Action delegate - c#

Related

In C#, is there a queue which can only hold an object once in its lifetime?

Linq optimisation within a foreach

Prioritized queues in Task Parallel Library

Can I cache partially-executed LINQ queries?

Caching IEnumerable

Categories

Resources