Caching IEnumerable

Caching IEnumerable - c#

public IEnumerable<ModuleData> ListModules()
{
foreach (XElement m in Source.Descendants("Module"))
{
yield return new ModuleData(m.Element("ModuleID").Value);
}
}
Initially the above code is great since there is no need to evaluate the entire collection if it is not needed.
However, once all the Modules have been enumerated once, it becomes more expensive to repeatedly query the XDocument when there is no change.
So, as a performance improvement:
public IEnumerable<ModuleData> ListModules()
{
if (Modules == null)
{
Modules = new List<ModuleData>();
foreach (XElement m in Source.Descendants("Module"))
{
Modules.Add(new ModuleData(m.Element("ModuleID").Value, 1, 1));
}
}
return Modules;
}
Which is great if I am repeatedly using the entire list but not so great otherwise.
Is there a middle ground where I can yield return until the entire list has been iterated, then cache it and serve the cache to subsequent requests?

You can look at Saving the State of Enumerators which describes how to create lazy list (which caches once iterated items).

Check out MemoizeAll() in the Reactive Extensions for .NET library (Rx). As it is evaluated lazily you can safely set it up during construction and just return Modules from ListModules():
Modules = Source.
Descendants("Module").
Select(m => new ModuleData(m.Element("ModuleID").Value, 1, 1)).
MemoizeAll();
There's a good explanation of MemoizeAll() (and some of the other less obvious Rx extensions) here.

I like #tsemer's answer. But I would like to propose my solutions, which has nothing to do with FP. It's naive approach, but it generates a lot less of allocations. And it is not thread safe.
public class CachedEnumerable<T> : IEnumerable<T>, IDisposable
{
IEnumerator<T> _enumerator;
readonly List<T> _cache = new List<T>();
public CachedEnumerable(IEnumerable<T> enumerable)
: this(enumerable.GetEnumerator())
{
}
public CachedEnumerable(IEnumerator<T> enumerator)
{
_enumerator = enumerator;
}
public IEnumerator<T> GetEnumerator()
{
// The index of the current item in the cache.
int index = 0;
// Enumerate the _cache first
for (; index < _cache.Count; index++)
{
yield return _cache[index];
}
// Continue enumeration of the original _enumerator,
// until it is finished.
// This adds items to the cache and increment
for (; _enumerator != null && _enumerator.MoveNext(); index++)
{
var current = _enumerator.Current;
_cache.Add(current);
yield return current;
}
if (_enumerator != null)
{
_enumerator.Dispose();
_enumerator = null;
}
// Some other users of the same instance of CachedEnumerable
// can add more items to the cache,
// so we need to enumerate them as well
for (; index < _cache.Count; index++)
{
yield return _cache[index];
}
}
public void Dispose()
{
if (_enumerator != null)
{
_enumerator.Dispose();
_enumerator = null;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
This is how the matrix test from #tsemer's answer will work:
var ints = new [] { 1, 2, 3, 4, 5 };
var cachedEnumerable = new CachedEnumerable<int>(ints);
foreach (var x in cachedEnumerable)
{
foreach (var y in cachedEnumerable)
{
//Do something
}
}
The outer loop (x) skips first for, because _cache is empty;
x fetches one item from the _enumerator to the _cache;
x pauses before second for loop;
The inner loop (y) enumerates one element from the _cache;
y fetches all elements from the _enumerator to the _cache;
y skips the third for loop, because its index variable equals 5;
x resumes, its index equals 1. It skips the second for loop because _enumerator is finished;
x enumerates one element from the _cache using the third for loop;
x pauses before the third for;
y enumerates 5 elements from the _cache using first for loop;
y skips the second for loop, because _enumerator is finished;
y skips the third for loop, because index of y equals 5;
x resumes, increments index. It fetches one element from the _cache using the third for loop.
x pauses.
if index variable of x is less than 5 then go to 10;
end.

I've seen a handful of implementations out there, some older and not taking advantage of newest .Net classes, some too elaborate for my needs. I ended up with the most concise and declarative code I could muster, which added up to a class with roughly 15 lines of (actual) code. It seems to align well with OP's needs:
Edit: Second revision, better support for empty enumerables
/// <summary>
/// A <see cref="IEnumerable{T}"/> that caches every item upon first enumeration.
/// </summary>
/// <seealso cref="http://blogs.msdn.com/b/matt/archive/2008/03/14/digging-deeper-into-lazy-and-functional-c.aspx"/>
/// <seealso cref="http://blogs.msdn.com/b/wesdyer/archive/2007/02/13/the-virtues-of-laziness.aspx"/>
public class CachedEnumerable<T> : IEnumerable<T> {
private readonly bool _hasItem; // Needed so an empty enumerable will not return null but an actual empty enumerable.
private readonly T _item;
private readonly Lazy<CachedEnumerable<T>> _nextItems;
/// <summary>
/// Initialises a new instance of <see cref="CachedEnumerable{T}"/> using <paramref name="item"/> as the current item
/// and <paramref name="nextItems"/> as a value factory for the <see cref="CachedEnumerable{T}"/> containing the next items.
/// </summary>
protected internal CachedEnumerable(T item, Func<CachedEnumerable<T>> nextItems) {
_hasItem = true;
_item = item;
_nextItems = new Lazy<CachedEnumerable<T>>(nextItems);
}
/// <summary>
/// Initialises a new instance of <see cref="CachedEnumerable{T}"/> with no current item and no next items.
/// </summary>
protected internal CachedEnumerable() {
_hasItem = false;
}
/// <summary>
/// Instantiates and returns a <see cref="CachedEnumerable{T}"/> for a given <paramref name="enumerable"/>.
/// Notice: The first item is always iterated through.
/// </summary>
public static CachedEnumerable<T> Create(IEnumerable<T> enumerable) {
return Create(enumerable.GetEnumerator());
}
/// <summary>
/// Instantiates and returns a <see cref="CachedEnumerable{T}"/> for a given <paramref name="enumerator"/>.
/// Notice: The first item is always iterated through.
/// </summary>
private static CachedEnumerable<T> Create(IEnumerator<T> enumerator) {
return enumerator.MoveNext() ? new CachedEnumerable<T>(enumerator.Current, () => Create(enumerator)) : new CachedEnumerable<T>();
}
/// <summary>
/// Returns an enumerator that iterates through the collection.
/// </summary>
public IEnumerator<T> GetEnumerator() {
if (_hasItem) {
yield return _item;
var nextItems = _nextItems.Value;
if (nextItems != null) {
foreach (var nextItem in nextItems) {
yield return nextItem;
}
}
}
}
/// <summary>
/// Returns an enumerator that iterates through a collection.
/// </summary>
IEnumerator IEnumerable.GetEnumerator() {
return GetEnumerator();
}
}
A useful extension method could be:
public static class IEnumerableExtensions {
/// <summary>
/// Instantiates and returns a <see cref="CachedEnumerable{T}"/> for a given <paramref name="enumerable"/>.
/// Notice: The first item is always iterated through.
/// </summary>
public static CachedEnumerable<T> ToCachedEnumerable<T>(this IEnumerable<T> enumerable) {
return CachedEnumerable<T>.Create(enumerable);
}
}
And for the unit testers amongst you: (if you don't use resharper just take out the [SuppressMessage] attributes)
/// <summary>
/// Tests the <see cref="CachedEnumerable{T}"/> class.
/// </summary>
[TestFixture]
public class CachedEnumerableTest {
private int _count;
/// <remarks>
/// This test case is only here to emphasise the problem with <see cref="IEnumerable{T}"/> which <see cref="CachedEnumerable{T}"/> attempts to solve.
/// </remarks>
[Test]
[SuppressMessage("ReSharper", "PossibleMultipleEnumeration")]
[SuppressMessage("ReSharper", "ReturnValueOfPureMethodIsNotUsed")]
public void MultipleEnumerationAreNotCachedForOriginalIEnumerable() {
_count = 0;
var enumerable = Enumerable.Range(1, 40).Select(IncrementCount);
enumerable.Take(3).ToArray();
enumerable.Take(10).ToArray();
enumerable.Take(4).ToArray();
Assert.AreEqual(17, _count);
}
/// <remarks>
/// This test case is only here to emphasise the problem with <see cref="IList{T}"/> which <see cref="CachedEnumerable{T}"/> attempts to solve.
/// </remarks>
[Test]
[SuppressMessage("ReSharper", "PossibleMultipleEnumeration")]
[SuppressMessage("ReSharper", "ReturnValueOfPureMethodIsNotUsed")]
public void EntireListIsEnumeratedForOriginalListOrArray() {
_count = 0;
Enumerable.Range(1, 40).Select(IncrementCount).ToList();
Assert.AreEqual(40, _count);
_count = 0;
Enumerable.Range(1, 40).Select(IncrementCount).ToArray();
Assert.AreEqual(40, _count);
}
[Test]
[SuppressMessage("ReSharper", "ReturnValueOfPureMethodIsNotUsed")]
public void MultipleEnumerationsAreCached() {
_count = 0;
var cachedEnumerable = Enumerable.Range(1, 40).Select(IncrementCount).ToCachedEnumerable();
cachedEnumerable.Take(3).ToArray();
cachedEnumerable.Take(10).ToArray();
cachedEnumerable.Take(4).ToArray();
Assert.AreEqual(10, _count);
}
[Test]
public void FreshCachedEnumerableDoesNotEnumerateExceptFirstItem() {
_count = 0;
Enumerable.Range(1, 40).Select(IncrementCount).ToCachedEnumerable();
Assert.AreEqual(1, _count);
}
/// <remarks>
/// Based on Jon Skeet's test mentioned here: http://www.siepman.nl/blog/post/2013/10/09/LazyList-A-better-LINQ-result-cache-than-List.aspx
/// </remarks>
[Test]
[SuppressMessage("ReSharper", "LoopCanBeConvertedToQuery")]
public void MatrixEnumerationIteratesAsExpectedWhileStillKeepingEnumeratedValuesCached() {
_count = 0;
var cachedEnumerable = Enumerable.Range(1, 5).Select(IncrementCount).ToCachedEnumerable();
var matrixCount = 0;
foreach (var x in cachedEnumerable) {
foreach (var y in cachedEnumerable) {
matrixCount++;
}
}
Assert.AreEqual(5, _count);
Assert.AreEqual(25, matrixCount);
}
[Test]
public void OrderingCachedEnumerableWorksAsExpectedWhileStillKeepingEnumeratedValuesCached() {
_count = 0;
var cachedEnumerable = Enumerable.Range(1, 5).Select(IncrementCount).ToCachedEnumerable();
var orderedEnumerated = cachedEnumerable.OrderBy(x => x);
var orderedEnumeratedArray = orderedEnumerated.ToArray(); // Enumerated first time in ascending order.
Assert.AreEqual(5, _count);
for (int i = 0; i < orderedEnumeratedArray.Length; i++) {
Assert.AreEqual(i + 1, orderedEnumeratedArray[i]);
}
var reorderedEnumeratedArray = orderedEnumerated.OrderByDescending(x => x).ToArray(); // Enumerated second time in descending order.
Assert.AreEqual(5, _count);
for (int i = 0; i < reorderedEnumeratedArray.Length; i++) {
Assert.AreEqual(5 - i, reorderedEnumeratedArray[i]);
}
}
private int IncrementCount(int value) {
_count++;
return value;
}
}

I quite like hazzik's answer...nice and simple is always the way.
BUT there is a bug in GetEnumerator
it sort of realises there is a problem, and thats why there is a strange 3rd loop after the 2nd enumerator loop....but it isnt quite as simple as that. The problem that triggers the need for the 3rd loop is general...so it needs to be recursive.
The answer though looks even simpler.
public IEnumerator<T> GetEnumerator()
{
int index = 0;
while (true)
{
if (index < _cache.Count)
{
yield return _cache[index];
index = index + 1;
}
else
{
if (_enumerator.MoveNext())
{
_cache.Add(_enumerator.Current);
}
else
{
yield break;
}
}
}
}
yes you can make it a tiny bit more efficient by yielding current...but I'll take the microsecond hit...it only ever happens once per element.
and its not threadsafe...but who cares about that.

Just to sum up things a bit:
In this answer a solution is presented, complete with an extension method for easy use and unit tests. However, as it uses recursion, performance can be expected to be worse than the other non recursive solution due to fewer allocations.
In this answer a non recursive solution is presented, including some code to account for the case where the enumerable is being enumerated twice. In this situation, however, it might not maintain the order of the original enumerable and it does not scale to more than two concurrent enumerations.
In this answer the enumerator method is rewritten to generalize the solution for the case of multiple concurrent enumeration while preserving the order of the original enumerable.
Combining the code from all answers we get the following class. Beware that this code is not thread safe, meaning that concurrent enumeration is safe only from the same thread.
public class CachedEnumerable<T> : IEnumerable<T>, IDisposable
{
private readonly IEnumerator<T> enumerator;
private readonly List<T> cache = new List<T>();
public CachedEnumerable(IEnumerable<T> enumerable) : this(enumerable.GetEnumerator()) { }
public CachedEnumerable(IEnumerator<T> enumerator)
=> this.enumerator = enumerator ?? throw new ArgumentNullException(nameof(enumerator));
public IEnumerator<T> GetEnumerator()
{
int index = 0;
while (true) {
if (index < cache.Count) {
yield return cache[index];
index++;
}
else if (enumerator.MoveNext())
cache.Add(enumerator.Current);
else
yield break;
}
}
public void Dispose() => enumerator.Dispose();
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}
With the static extension method for easy use:
public static class EnumerableUtils
{
public static CachedEnumerable<T> ToCachedEnumerable<T>(this IEnumerable<T> enumerable)
=> new CachedEnumerable<T>(enumerable);
}
And the corresponding unit tests:
public class CachedEnumerableTest
{
private int _count;
[Test]
public void MultipleEnumerationAreNotCachedForOriginalIEnumerable()
{
_count = 0;
var enumerable = Enumerable.Range(1, 40).Select(incrementCount);
enumerable.Take(3).ToArray();
enumerable.Take(10).ToArray();
enumerable.Take(4).ToArray();
Assert.AreEqual(17, _count);
}
[Test]
public void EntireListIsEnumeratedForOriginalListOrArray()
{
_count = 0;
Enumerable.Range(1, 40).Select(incrementCount).ToList();
Assert.AreEqual(40, _count);
_count = 0;
Enumerable.Range(1, 40).Select(incrementCount).ToArray();
Assert.AreEqual(40, _count);
}
[Test]
public void MultipleEnumerationsAreCached()
{
_count = 0;
var cachedEnumerable = Enumerable.Range(1, 40).Select(incrementCount).ToCachedEnumerable();
cachedEnumerable.Take(3).ToArray();
cachedEnumerable.Take(10).ToArray();
cachedEnumerable.Take(4).ToArray();
Assert.AreEqual(10, _count);
}
[Test]
public void FreshCachedEnumerableDoesNotEnumerateExceptFirstItem()
{
_count = 0;
Enumerable.Range(1, 40).Select(incrementCount).ToCachedEnumerable();
Assert.That(_count <= 1);
}
[Test]
public void MatrixEnumerationIteratesAsExpectedWhileStillKeepingEnumeratedValuesCached()
{
_count = 0;
var cachedEnumerable = Enumerable.Range(1, 5).Select(incrementCount).ToCachedEnumerable();
var matrixCount = 0;
foreach (var x in cachedEnumerable) {
foreach (var y in cachedEnumerable) {
matrixCount++;
}
}
Assert.AreEqual(5, _count);
Assert.AreEqual(25, matrixCount);
}
[Test]
public void OrderingCachedEnumerableWorksAsExpectedWhileStillKeepingEnumeratedValuesCached()
{
_count = 0;
var cachedEnumerable = Enumerable.Range(1, 5).Select(incrementCount).ToCachedEnumerable();
var orderedEnumerated = cachedEnumerable.OrderBy(x => x);
var orderedEnumeratedArray = orderedEnumerated.ToArray(); // Enumerated first time in ascending order.
Assert.AreEqual(5, _count);
for (int i = 0; i < orderedEnumeratedArray.Length; i++) {
Assert.AreEqual(i + 1, orderedEnumeratedArray[i]);
}
var reorderedEnumeratedArray = orderedEnumerated.OrderByDescending(x => x).ToArray(); // Enumerated second time in descending order.
Assert.AreEqual(5, _count);
for (int i = 0; i < reorderedEnumeratedArray.Length; i++) {
Assert.AreEqual(5 - i, reorderedEnumeratedArray[i]);
}
}
private int incrementCount(int value)
{
_count++;
return value;
}
}

I don't see any serious problem with the idea to cache results in a list, just like in the above code. Probably, it would be better to construct the list using ToList() method.
public IEnumerable<ModuleData> ListModules()
{
if (Modules == null)
{
Modules = Source.Descendants("Module")
.Select(m => new ModuleData(m.Element("ModuleID").Value, 1, 1)))
.ToList();
}
return Modules;
}

Related

How to apply contain on last record and delete if found in LINQ?

I have a list of strings like
AAPL,28/03/2012,88.34,88.778,87.187,88.231,163682382
AAPL,29/03/2012,87.54,88.08,86.747,87.123,151551216
FB,30/03/2012,86.967,87.223,85.42,85.65,182255227
Now I want to delete only last record if it does not contains AAPL(symbol name) using LINQ.
Below I have write my code which contains multiple line but I want to make it single line code,
fileLines = System.IO.File.ReadAllLines(fileName).AsParallel().Skip(1).ToList();
var lastLine = fileLines.Last();
if (!lastLine.Contains(item.sym))
{
fileLines.RemoveAt(fileLines.Count - 1);
}
So How can I make all it in single line linq query ?

You could use the ternary operator to decide on the tail to concatenate as follows.
fileLines
= fileLines.Take(fileLines.Count())
.Concat(fileLines.Last().Contains(item.sym) ? Enumerable.Empty
: new string[]{ item.sym });
You could formulate it even more contracted as follows.
fileLines
= System.IO.File.ReadAllLines(fileName)
.AsParallel()
.Skip(1)
.Take(fileLines.Count())
.Concat(fileLines.Last().Contains(item.sym) ? Enumerable.Empty
: new string[]{ item.sym });
.ToList();
That being said, such an endeavour is questionable. The accumulation of lazily evaluated Linq extension methods is difficult to debug.

I understand you need to simplify the filtering operation, and, from what I see in your case, you're missing only one piece of information (i.e whether or not current item is the last one in an enumerated collection) that will help you define your predicate. What I'm about to write now might not seem "a simple single line"; however, it's gonna be a reusable extension that will provide this piece of information (and more) without performing extra and unnecessary loops or iterations.
The final product of that will be:
IEnumerable<string> fileLines = System.IO.File.ReadLines(fileName).RichWhere((item, originalIndex, countedIndex, hasMoreItems) => hasMoreItems || item.StartsWith("AAPL"));
The LINQ-like extension that I wrote inspired by Microsoft's Enumerable at ReferenceSource:
public delegate bool RichPredicate<T>(T item, int originalIndex, int countedIndex, bool hasMoreItems);
public static class EnumerableExtensions
{
/// <remarks>
/// This was contributed by Aly El-Haddad as an answer to this Stackoverflow.com question:
/// https://stackoverflow.com/q/54829095/3602352
/// </remarks>
public static IEnumerable<T> RichWhere<T>(this IEnumerable<T> source, RichPredicate<T> predicate)
{
return new RichWhereIterator<T>(source, predicate);
}
private class RichWhereIterator<T> : IEnumerable<T>, IEnumerator<T>
{
private readonly int threadId;
private readonly IEnumerable<T> source;
private readonly RichPredicate<T> predicate;
private IEnumerator<T> enumerator;
private int state;
private int countedIndex = -1;
private int originalIndex = -1;
private bool hasMoreItems;
public RichWhereIterator(IEnumerable<T> source, RichPredicate<T> predicate)
{
threadId = Thread.CurrentThread.ManagedThreadId;
this.source = source ?? throw new ArgumentNullException(nameof(source));
this.predicate = predicate ?? ((item, originalIndex, countedIndex, hasMoreItems) => true);
}
public T Current { get; private set; }
object IEnumerator.Current => Current;
public void Dispose()
{
if (enumerator is IDisposable disposable)
disposable.Dispose();
enumerator = null;
originalIndex = -1;
countedIndex = -1;
hasMoreItems = false;
Current = default(T);
state = -1;
}
public bool MoveNext()
{
switch (state)
{
case 1:
enumerator = source.GetEnumerator();
if (!(hasMoreItems = enumerator.MoveNext()))
{
Dispose();
break;
}
++originalIndex;
state = 2;
goto case 2;
case 2:
if (!hasMoreItems) //last predicate returned true and that was the last item
{
Dispose();
break;
}
T current = enumerator.Current;
hasMoreItems = enumerator.MoveNext();
++originalIndex;
if (predicate(current, originalIndex - 1, countedIndex + 1, hasMoreItems))
{
++countedIndex;
Current = current;
return true;
}
else if (hasMoreItems)
{ goto case 2; }
//predicate returned false and there're no more items
Dispose();
break;
}
return false;
}
public void Reset()
{
Current = default(T);
hasMoreItems = false;
originalIndex = -1;
countedIndex = -1;
state = 1;
}
public IEnumerator<T> GetEnumerator()
{
if (threadId == Thread.CurrentThread.ManagedThreadId && state == 0)
{
state = 1;
return this;
}
return new RichWhereIterator<T>(source, predicate) { state = 1 };
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
}
RichPredicate<T>, which could be thought of as Func<T, int, int, bool, bool> provide this information about each item:
item: the item to evaluate.
originalIndex: the index of that item in its original IEnumerable<T> source (the one which was directly passed to RichWhere).
countedIndex: the index of that item IF the predicate would evaluate to true.
hasMoreItems: tells whether or not this would be the last item from the original IEnumerable<T> source.

IEnumerable<T> Skip on unlimited sequence

I have a simple implementation of Fibonacci sequence using BigInteger:
internal class FibonacciEnumerator : IEnumerator<BigInteger>
{
private BigInteger _previous = 1;
private BigInteger _current = 0;
public void Dispose(){}
public bool MoveNext() {return true;}
public void Reset()
{
_previous = 1;
_current = 0;
}
public BigInteger Current
{
get
{
var temp = _current;
_current += _previous;
_previous = temp;
return _current;
}
}
object IEnumerator.Current { get { return Current; }
}
}
internal class FibonacciSequence : IEnumerable<BigInteger>
{
private readonly FibonacciEnumerator _f = new FibonacciEnumerator();
public IEnumerator<BigInteger> GetEnumerator(){return _f;}
IEnumerator IEnumerable.GetEnumerator(){return GetEnumerator();}
}
It is an unlimited sequence as the MoveNext() always returns true.
When called using
var fs = new FibonacciSequence();
fs.Take(10).ToList().ForEach(_ => Console.WriteLine(_));
the output is as expected (1,1,2,3,5,8,...)
I want to select 10 items but starting at 100th position. I tried calling it via
fs.Skip(100).Take(10).ToList().ForEach(_ => Console.WriteLine(_));
but this does not work, as it outputs ten elements from the beginning (i.e. the output is again 1,1,2,3,5,8,...).
I can skip it by calling SkipWhile
fs.SkipWhile((b,index) => index < 100).Take(10).ToList().ForEach(_ => Console.WriteLine(_));
which correctly outputs 10 elements starting from the 100th element.
Is there something else that needs/can be implemented in the enumerator to make the Skip(...) work?

Skip(n) doesn't access Current, it just calls MoveNext() n times.
So you need to perform the increment in MoveNext(), which is the logical place for that operation anyway:
Current does not move the position of the enumerator, and consecutive calls to Current return the same object until either MoveNext or Reset is called.

CodeCaster's answer is spot on - I'd just like to point out that you don't really need to implement your own enumerable for something like this:
public IEnumerable<BigInteger> FibonacciSequence()
{
var previous = BigInteger.One;
var current = BigInteger.Zero;
while (true)
{
yield return current;
var temp = current;
current += previous;
previous = temp;
}
}
The compiler will create both the enumerator and the enumerable for you. For a simple enumerable like this the difference isn't really all that big (you just avoid tons of boilerplate), but if you actually need something more complicated than a simple recursive function, it makes a huge difference.

Move your logic into MoveNext:
public bool MoveNext()
{
var temp = _current;
_current += _previous;
_previous = temp;
return true;
}
public void Reset()
{
_previous = 1;
_current = 0;
}
public BigInteger Current
{
get
{
return _current;
}
}
Skip(10) is simply calling MoveNext 10 times, and then Current. It also makes more logical sense to have the operation done in MoveNext, rather than current.

In C#, is there a queue which can only hold an object once in its lifetime?

I need a datastructure, which is a special type of queue. I want that, if an instance of my queue ever contained an object X, it shouldn't be possible to enqueue X again in this instance. The enqueuing method should just do nothing if called with X, like the attempt to add a duplicate value to a HashSet.
Example usage:
MyQueue<int> queue = new MyQueue<int>();
queue.Enqueue(5);
queue.Enqueue(17);
queue.Enqueue(28);
queue.Enqueue(17);
int firstNumber = queue.Dequeue();
queue.Enqueue(5);
queue.Enqueue(3);
List<int> queueContents = queue.ToList(); //this list should contain {17, 28, 3}
I looked around on MSDN, but couldn't find such a class. Does it exist, or do I have to implement it myself?
I guess I could use a different data structure too, but access will always be FIFO, so I thought a queue will be most efficient. Also, I don't know of any other structure which provides such "uniqueness over instance lifetime" feature.

I would do something similar to this:
class UniqueQueue<T>
{
private readonly Queue<T> queue = new Queue<T>();
private HashSet<T> alreadyAdded = new HashSet<T>();
public virtual void Enqueue(T item)
{
if (alreadyAdded.Add(item)) { queue.Enqueue(item); }
}
public int Count { get { return queue.Count; } }
public virtual T Dequeue()
{
T item = queue.Dequeue();
return item;
}
}
Note, most of this code was borrowed from This Thread.

You'd have to implement that yourself.
One idea is just to add the element to a HashSet when you enqueue it.
Then, when you want to enqueue, just check the HashSet for the item, if it exists, don't enqueue.
Since you want to prevent enqueuing for the rest of the queue's lifetime, you probably won't want to ever remove from the HashSet.

This is just a extended version of wayne's answer, it is just a little more fleshed out and having a few more interfaces supported. (To mimic Queue<T>'s interfaces)
sealed class UniqueQueue<T> : IEnumerable<T>, ICollection, IEnumerable
{
private readonly Queue<T> queue;
private readonly HashSet<T> alreadyAdded;
public UniqueQueue(IEqualityComparer<T> comparer)
{
queue = new Queue<T>();
alreadyAdded = new HashSet<T>(comparer);
}
public UniqueQueue(IEnumerable<T> collection, IEqualityComparer<T> comparer)
{
//Do this so the enumeration does not happen twice in case the enumerator behaves differently each enumeration.
var localCopy = collection.ToList();
queue = new Queue<T>(localCopy);
alreadyAdded = new HashSet<T>(localCopy, comparer);
}
public UniqueQueue(int capacity, IEqualityComparer<T> comparer)
{
queue = new Queue<T>(capacity);
alreadyAdded = new HashSet<T>(comparer);
}
//Here are the constructors that use the default comparer. By passing null in for the comparer it will just use the default one for the type.
public UniqueQueue() : this((IEqualityComparer<T>) null) { }
public UniqueQueue(IEnumerable<T> collection) : this(collection, null) { }
public UniqueQueue(int capacity) : this(capacity, null) { }
/// <summary>
/// Attempts to enqueue a object, returns false if the object was ever added to the queue in the past.
/// </summary>
/// <param name="item">The item to enqueue</param>
/// <returns>True if the object was successfully added, false if it was not</returns>
public bool Enqueue(T item)
{
if (!alreadyAdded.Add(item))
return false;
queue.Enqueue(item);
return true;
}
public int Count
{
get { return queue.Count; }
}
public T Dequeue()
{
return queue.Dequeue();
}
IEnumerator<T> IEnumerable<T>.GetEnumerator()
{
return ((IEnumerable<T>)queue).GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return ((IEnumerable)queue).GetEnumerator();
}
void ICollection.CopyTo(Array array, int index)
{
((ICollection)queue).CopyTo(array, index);
}
bool ICollection.IsSynchronized
{
get { return ((ICollection)queue).IsSynchronized; }
}
object ICollection.SyncRoot
{
get { return ((ICollection)queue).SyncRoot; }
}
}

You can use a basic queue, but modify the Enqueue method to verify the previous values entered. Here, I used a hashset to contain those previous values :
public class UniqueValueQueue<T> : Queue<T>
{
private readonly HashSet<T> pastValues = new HashSet<T>();
public new void Enqueue(T item)
{
if (!pastValues.Contains(item))
{
pastValues.Add(item);
base.Enqueue(item);
}
}
}
With your test case
UniqueValueQueue<int> queue = new UniqueValueQueue<int>();
queue.Enqueue(5);
queue.Enqueue(17);
queue.Enqueue(28);
queue.Enqueue(17);
int firstNumber = queue.Dequeue();
queue.Enqueue(5);
queue.Enqueue(3);
List<int> queueContents = queue.ToList();
queueContents contains 17, 28 and 3.

A neater way to use Function delegate over Action delegate

I wish to iterate over a collection in a thread safe manner. I find it's convieniant to have one method called
/// <summary>
/// This visits all items and performs an action on them in a thread manner
/// </summary>
/// <param name="visitAction">The action to perform on the item</param>
public void VisitAllItems(Action<Item> visitAction) {
lock (listLock) {
foreach (Item item in this.ItemList) {
visitAction.Invoke(item);
}
}
}
An example of it's use could be
/// <summary>
/// Saves each item in the list to the database
/// </summary>
protected static void SaveListToDatabase() {
this.VisitAllItems(item => {
bool itemSavedSuccessfully = item.SaveToDB();
if(!itemSavedSuccessfully) {
//log error message
}
});
}
Another example of it's use would be
/// <summary>
/// Get the number of special items in the list.
/// </summary>
protected int GetNumberOfUnsynchronisedItems() {
int numberOfSpecialItems = 0;
this.VisitAllItems((item) => {
numberOfSpecialItems += item.IsItemSpecial() ? 1 : 0;
});
return numberOfSpecialItems;
}
But I'm sure there is a better way of writing the VisitAllItems method to use a Func<> rather than an Action<> delegate to return this value. I've tried several things but end up with compilation errors.
Can anyone see a neater way to implement this method?
Thanks,
Alasdair

It all depends on the usage, so it's hard to tell what exactly would be useful for you. But one way would be to return one result for each item in the collection, and then use LINQ to combine the results.
public IEnumerable<TResult> VisitAllItems<TResult>(
Func<Item, Result> visitfunction)
{
var result = new List<TResult>();
lock (listLock)
{
foreach (Item item in ItemList)
result.Add(visitfunction(item));
}
return result;
}
Normally, I would use yield return instead of manually creating the List<T>, but I think it's better this way here, because of the lock.
The usage would be like this:
protected int GetNumberOfUnsynchronisedItems()
{
return VisitAllItems(item => item.IsItemSpecial())
.Count(isSpecial => isSpecial);
}

You could follow the Aggregate model, but it wouldn't be terribly pleasant to use:
public TAccumulate VisitAllItems<TAccumulate>(TAccumulate seed,
Func<Item, TAccumulate, TAccumulate> visitor) {
TAccumulate current = seed;
lock (listLock) {
foreach (Item item in this.ItemList) {
current = visitor(current, item);
}
}
return current;
}
...
protected int GetNumberOfUnsynchronisedItems() {
return VisitAllItems(0,
(count, item) => count + item.IsItemSpecial() ? 1 : 0);
}
How many different aggregation functions do you actually need? For example, if most of the time you're just counting, you might want:
public void VisitAndCount<TAccumulate>(Func<Item, bool> visitor) {
int count = 0;
lock (listLock) {
foreach (Item item in this.ItemList) {
if (visitor(item)) {
count++;
}
}
}
return count;
}
then:
protected int GetNumberOfUnsynchronisedItems() {
return VisitAndCount(item => item.IsItemSpecial());
}

Can I cache partially-executed LINQ queries?

I have the following code:
IEnumerable<KeyValuePair<T, double>> items =
sequence.Select(item => new KeyValuePair<T, double>(item, weight(item)));
if (items.Any(pair => pair.Value<0))
throw new ArgumentException("Item weights cannot be less than zero.");
double sum = items.Sum(pair => pair.Value);
foreach (KeyValuePair<T, double> pair in items) {...}
Where weight is a Func<T, double>.
The problem is I want weight to be executed as few times as possible. This means it should be executed at most once for each item. I could achieve this by saving it to an array. However, if any weight returns a negative value, I don't want to continue execution.
Is there any way to accomplish this easily within the LINQ framework?

Sure, that's totally doable:
public static Func<A, double> ThrowIfNegative<A, double>(this Func<A, double> f)
{
return a=>
{
double r = f(a);
// if r is NaN then this will throw.
if ( !(r >= 0.0) )
throw new Exception();
return r;
};
}
public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
var d = new Dictionary<A, R>();
return a=>
{
R r;
if (!d.TryGetValue(a, out r))
{
r = f(a);
d.Add(a, r);
}
return r;
};
}
And now...
Func<T, double> weight = whatever;
weight = weight.ThrowIfNegative().Memoize();
and you're done.

One way is to move the exception into the weight function, or at least simulate doing so, by doing something like:
Func<T, double> weightWithCheck = i =>
{
double result = weight(i);
if (result < 0)
{
throw new ArgumentException("Item weights cannot be less than zero.");
}
return result;
};
IEnumerable<KeyValuePair<T, double>> items =
sequence.Select(item => new KeyValuePair<T, double>(item, weightWithCheck(item)));
double sum = items.Sum(pair => pair.Value);
By this point, if there is an exception to be had, you should have it. You do have to enumerate items before you can be assured of getting the exception, though, but once you get it, you will not call weight again.

Both answers are good (where to throw the exception, and memoizing the function).
But your real problem is that your LINQ expression is evaluated every time you use it, unless you force it to evaluate and store as a List (or similar). Just change this:
sequence.Select(item => new KeyValuePair<T, double>(item, weight(item)));
To this:
sequence.Select(item => new KeyValuePair<T, double>(item, weight(item))).ToList();

You could possibly do it with a foreach loop. Here is a way to do it in one statement:
IEnumerable<KeyValuePair<T, double>> items = sequence
.Select(item => new KeyValuePair<T, double>(item, weight(item)))
.Select(kvp =>
{
if (kvp.Value < 0)
throw new ArgumentException("Item weights cannot be less than zero.");
else
return kvp;
}
);

No, there is nothing already IN the LINQ framework to do this, but you could surely write up your own methods and invoke them from the linq query (As has already been shown by many).
Personally, I would either ToList the first query or use Eric's suggestion.

Instead of a functional memoization suggested by other answers, you can also employ a memoization for the whole data sequence:
var items = sequence
.Select(item => new KeyValuePair<T, double>(item, weight(item)))
.Memoize();
(Note a call to Memoize() method at the end of the expression above)
A nice property of data memoization is that it represents a drop-in replacement for ToList() or ToArray() approaches.
The fully featured implementation is pretty involved though:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
static class MemoizationExtensions
{
/// <summary>
/// Memoize all elements of a sequence, e.g. ensure that every element of a sequence is retrieved only once.
/// </summary>
/// <remarks>
/// The resulting sequence is not thread safe.
/// </remarks>
/// <typeparam name="T">The type of the elements of source.</typeparam>
/// <param name="source">The source sequence.</param>
/// <returns>The sequence that fully replicates the source with all elements being memoized.</returns>
public static IEnumerable<T> Memoize<T>(this IEnumerable<T> source) => Memoize(source, false);
/// <summary>
/// Memoize all elements of a sequence, e.g. ensure that every element of a sequence is retrieved only once.
/// </summary>
/// <typeparam name="T">The type of the elements of source.</typeparam>
/// <param name="source">The source sequence.</param>
/// <param name="isThreadSafe">Indicates whether resulting sequence is thread safe.</param>
/// <returns>The sequence that fully replicates the source with all elements being memoized.</returns>
public static IEnumerable<T> Memoize<T>(this IEnumerable<T> source, bool isThreadSafe)
{
switch (source)
{
case null:
return null;
case CachedEnumerable<T> existingCachedEnumerable:
if (!isThreadSafe || existingCachedEnumerable is ThreadSafeCachedEnumerable<T>)
{
// The source is already memoized with compatible parameters.
return existingCachedEnumerable;
}
break;
case IList<T> _:
case IReadOnlyList<T> _:
case string _:
// Given source types are intrinsically memoized by their nature.
return source;
}
if (isThreadSafe)
return new ThreadSafeCachedEnumerable<T>(source);
else
return new CachedEnumerable<T>(source);
}
class CachedEnumerable<T> : IEnumerable<T>, IReadOnlyList<T>
{
public CachedEnumerable(IEnumerable<T> source)
{
_Source = source;
}
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
IEnumerable<T> _Source;
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
IEnumerator<T> _SourceEnumerator;
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
protected readonly IList<T> Cache = new List<T>();
public virtual int Count
{
get
{
while (_TryCacheElementNoLock()) ;
return Cache.Count;
}
}
bool _TryCacheElementNoLock()
{
if (_SourceEnumerator == null && _Source != null)
{
_SourceEnumerator = _Source.GetEnumerator();
_Source = null;
}
if (_SourceEnumerator == null)
{
// Source enumerator already reached the end.
return false;
}
else if (_SourceEnumerator.MoveNext())
{
Cache.Add(_SourceEnumerator.Current);
return true;
}
else
{
// Source enumerator has reached the end, so it is no longer needed.
_SourceEnumerator.Dispose();
_SourceEnumerator = null;
return false;
}
}
public virtual T this[int index]
{
get
{
_EnsureItemIsCachedNoLock(index);
return Cache[index];
}
}
public IEnumerator<T> GetEnumerator() => new CachedEnumerator<T>(this);
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
internal virtual bool EnsureItemIsCached(int index) => _EnsureItemIsCachedNoLock(index);
bool _EnsureItemIsCachedNoLock(int index)
{
while (Cache.Count <= index)
{
if (!_TryCacheElementNoLock())
return false;
}
return true;
}
internal virtual T GetCacheItem(int index) => Cache[index];
}
sealed class ThreadSafeCachedEnumerable<T> : CachedEnumerable<T>
{
public ThreadSafeCachedEnumerable(IEnumerable<T> source) :
base(source)
{
}
public override int Count
{
get
{
lock (Cache)
return base.Count;
}
}
public override T this[int index]
{
get
{
lock (Cache)
return base[index];
}
}
internal override bool EnsureItemIsCached(int index)
{
lock (Cache)
return base.EnsureItemIsCached(index);
}
internal override T GetCacheItem(int index)
{
lock (Cache)
return base.GetCacheItem(index);
}
}
sealed class CachedEnumerator<T> : IEnumerator<T>
{
CachedEnumerable<T> _CachedEnumerable;
const int InitialIndex = -1;
const int EofIndex = -2;
int _Index = InitialIndex;
public CachedEnumerator(CachedEnumerable<T> cachedEnumerable)
{
_CachedEnumerable = cachedEnumerable;
}
public T Current
{
get
{
var cachedEnumerable = _CachedEnumerable;
if (cachedEnumerable == null)
throw new InvalidOperationException();
var index = _Index;
if (index < 0)
throw new InvalidOperationException();
return cachedEnumerable.GetCacheItem(index);
}
}
object IEnumerator.Current => Current;
public void Dispose()
{
_CachedEnumerable = null;
}
public bool MoveNext()
{
var cachedEnumerable = _CachedEnumerable;
if (cachedEnumerable == null)
{
// Disposed.
return false;
}
if (_Index == EofIndex)
return false;
_Index++;
if (!cachedEnumerable.EnsureItemIsCached(_Index))
{
_Index = EofIndex;
return false;
}
else
{
return true;
}
}
public void Reset()
{
_Index = InitialIndex;
}
}
}
More info and a readily available NuGet package: https://github.com/gapotchenko/Gapotchenko.FX/tree/master/Source/Gapotchenko.FX.Linq#memoize

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Caching IEnumerable - c#

You can look at Saving the State of Enumerators which describes how to create lazy list (which caches once iterated items).

Related

How to apply contain on last record and delete if found in LINQ?

IEnumerable<T> Skip on unlimited sequence

In C#, is there a queue which can only hold an object once in its lifetime?

A neater way to use Function delegate over Action delegate

Can I cache partially-executed LINQ queries?

Categories

Resources