Thread-safe Cached Enumerator - lock with yield - c#

I have a custom "CachedEnumerable" class (inspired by Caching IEnumerable) that I need to make thread safe for my asp.net core web app.
Is the following implementation of the Enumerator thread safe? (All other reads/writes to IList _cache are locked appropriately) (Possibly related to Does the C# Yield free a lock?)
And more specifically, if there are 2 threads accessing the enumerator, how do I protect against one thread incrementing "index" causing a second enumerating thread from getting the wrong element from the _cache (ie. element at index + 1 instead of at index)? Is this race condition a real concern?
public IEnumerator<T> GetEnumerator()
{
var index = 0;
while (true)
{
T current;
lock (_enumeratorLock)
{
if (index >= _cache.Count && !MoveNext()) break;
current = _cache[index];
index++;
}
yield return current;
}
}
Full code of my version of CachedEnumerable:
public class CachedEnumerable<T> : IDisposable, IEnumerable<T>
{
IEnumerator<T> _enumerator;
private IList<T> _cache = new List<T>();
public bool CachingComplete { get; private set; } = false;
public CachedEnumerable(IEnumerable<T> enumerable)
{
switch (enumerable)
{
case CachedEnumerable<T> cachedEnumerable: //This case is actually dealt with by the extension method.
_cache = cachedEnumerable._cache;
CachingComplete = cachedEnumerable.CachingComplete;
_enumerator = cachedEnumerable.GetEnumerator();
break;
case IList<T> list:
//_cache = list; //without clone...
//Clone:
_cache = new T[list.Count];
list.CopyTo((T[]) _cache, 0);
CachingComplete = true;
break;
default:
_enumerator = enumerable.GetEnumerator();
break;
}
}
public CachedEnumerable(IEnumerator<T> enumerator)
{
_enumerator = enumerator;
}
private int CurCacheCount
{
get
{
lock (_enumeratorLock)
{
return _cache.Count;
}
}
}
public IEnumerator<T> GetEnumerator()
{
var index = 0;
while (true)
{
T current;
lock (_enumeratorLock)
{
if (index >= _cache.Count && !MoveNext()) break;
current = _cache[index];
index++;
}
yield return current;
}
}
//private readonly AsyncLock _enumeratorLock = new AsyncLock();
private readonly object _enumeratorLock = new object();
private bool MoveNext()
{
if (CachingComplete) return false;
if (_enumerator != null && _enumerator.MoveNext()) //The null check should have been unnecessary b/c of the lock...
{
_cache.Add(_enumerator.Current);
return true;
}
else
{
CachingComplete = true;
DisposeWrappedEnumerator(); //Release the enumerator, as it is no longer needed.
}
return false;
}
public T ElementAt(int index)
{
lock (_enumeratorLock)
{
if (index < _cache.Count)
{
return _cache[index];
}
}
EnumerateUntil(index);
lock (_enumeratorLock)
{
if (_cache.Count <= index) throw new ArgumentOutOfRangeException(nameof(index));
return _cache[index];
}
}
public bool TryGetElementAt(int index, out T value)
{
lock (_enumeratorLock)
{
value = default;
if (index < CurCacheCount)
{
value = _cache[index];
return true;
}
}
EnumerateUntil(index);
lock (_enumeratorLock)
{
if (_cache.Count <= index) return false;
value = _cache[index];
}
return true;
}
private void EnumerateUntil(int index)
{
while (true)
{
lock (_enumeratorLock)
{
if (_cache.Count > index || !MoveNext()) break;
}
}
}
public void Dispose()
{
DisposeWrappedEnumerator();
}
private void DisposeWrappedEnumerator()
{
if (_enumerator != null)
{
_enumerator.Dispose();
_enumerator = null;
if (_cache is List<T> list)
{
list.Trim();
}
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
public int CachedCount
{
get
{
lock (_enumeratorLock)
{
return _cache.Count;
}
}
}
public int Count()
{
if (CachingComplete)
{
return _cache.Count;
}
EnsureCachingComplete();
return _cache.Count;
}
private void EnsureCachingComplete()
{
if (CachingComplete)
{
return;
}
//Enumerate the rest of the collection
while (!CachingComplete)
{
lock (_enumeratorLock)
{
if (!MoveNext()) break;
}
}
}
public T[] ToArray()
{
EnsureCachingComplete();
//Once Caching is complete, we don't need to lock
if (!(_cache is T[] array))
{
array = _cache.ToArray();
_cache = array;
}
return array;
}
public T this[int index] => ElementAt(index);
}
public static CachedEnumerable<T> Cached<T>(this IEnumerable<T> source)
{
//no gain in caching a cache.
if (source is CachedEnumerable<T> cached)
{
return cached;
}
return new CachedEnumerable<T>(source);
}
}
Basic Usage: (Although not a meaningful use case)
var cached = expensiveEnumerable.Cached();
foreach (var element in cached) {
Console.WriteLine(element);
}
Update
I tested the current implementation based on #Theodors answer https://stackoverflow.com/a/58547863/5683904 and confirmed (AFAICT) that it is thread-safe when enumerated with a foreach without creating duplicate values (Thread-safe Cached Enumerator - lock with yield):
class Program
{
static async Task Main(string[] args)
{
var enumerable = Enumerable.Range(0, 1_000_000);
var cachedEnumerable = new CachedEnumerable<int>(enumerable);
var c = new ConcurrentDictionary<int, List<int>>();
var tasks = Enumerable.Range(1, 100).Select(id => Test(id, cachedEnumerable, c));
Task.WaitAll(tasks.ToArray());
foreach (var keyValuePair in c)
{
var hasDuplicates = keyValuePair.Value.Distinct().Count() != keyValuePair.Value.Count;
Console.WriteLine($"Task #{keyValuePair.Key} count: {keyValuePair.Value.Count}. Has duplicates? {hasDuplicates}");
}
}
static async Task Test(int id, IEnumerable<int> cache, ConcurrentDictionary<int, List<int>> c)
{
foreach (var i in cache)
{
//await Task.Delay(10);
c.AddOrUpdate(id, v => new List<int>() {i}, (k, v) =>
{
v.Add(i);
return v;
});
}
}
}

Your class is not thread safe, because shared state is mutated in unprotected regions inside your class. The unprotected regions are:
The constructor
The Dispose method
The shared state is:
The _enumerator private field
The _cache private field
The CachingComplete public property
Some other issues regarding your class:
Implementing IDisposable creates the responsibility to the caller to dispose your class. There is no need for IEnumerables to be disposable. In the contrary IEnumerators are disposable, but there is language support for their automatic disposal (feature of foreach statement).
Your class offers extended functionality not expected from an IEnumerable (ElementAt, Count etc). Maybe you intended to implement a CachedList instead? Without implementing the IList<T> interface, LINQ methods like Count() and ToArray() cannot take advantage of your extended functionality, and will use the slow path like they do with plain vanilla IEnumerables.
Update: I just noticed another thread-safety issue. This one is related to the public IEnumerator<T> GetEnumerator() method. The enumerator is compiler-generated, since the method is an iterator (utilizes yield return). Compiler-generated enumerators are not thread safe. Consider this code for example:
var enumerable = Enumerable.Range(0, 1_000_000);
var cachedEnumerable = new CachedEnumerable<int>(enumerable);
var enumerator = cachedEnumerable.GetEnumerator();
var tasks = Enumerable.Range(1, 4).Select(id => Task.Run(() =>
{
int count = 0;
while (enumerator.MoveNext())
{
count++;
}
Console.WriteLine($"Task #{id} count: {count}");
})).ToArray();
Task.WaitAll(tasks);
Four threads are using concurrently the same IEnumerator. The enumerable has 1,000,000 items. You may expect that each thread would enumerate ~250,000 items, but that's not what happens.
Output:
Task #1 count: 0
Task #4 count: 0
Task #3 count: 0
Task #2 count: 1000000
The MoveNext in the line while (enumerator.MoveNext()) is not your safe MoveNext. It is the compiler-generated unsafe MoveNext. Although unsafe, it includes a mechanism intended probably for dealing with exceptions, that marks temporarily the enumerator as finished before calling the externally provided code. So when multiple threads are calling the MoveNext concurrently, all but the first will get a return value of false, and will terminate instantly the enumeration, having completed zero loops. To solve this you must probably code your own IEnumerator class.
Update: Actually my last point about thread-safe enumeration is a bit unfair, because enumerating with the IEnumerator interface is an inherently unsafe operation, which is impossible to fix without the cooperation of the calling code. This is because obtaining the next element is not an atomic operation, since it involves two steps (call MoveNext() + read Current). So your thread-safety concerns are limited to the protection of the internal state of your class (fields _enumerator, _cache and CachingComplete). These are left unprotected only in the constructor and in the Dispose method, but I suppose that the normal use of your class may not follow code paths that create the race conditions that would result to internal state corruption.
Personally I would prefer to take care of these code paths too, and I wouldn't let it to the whims of chance.
Update: I wrote a cache for IAsyncEnumerables, to demonstrate an alternative technique. The enumeration of the source IAsyncEnumerable is not driven by the callers, using locks or semaphores to obtain exclusive access, but by a separate worker-task. The first caller starts the worker-task. Each caller at first yields all items that are already cached, and then awaits for more items, or for a notification that there are no more items. As notification mechanism I used a TaskCompletionSource<bool>. A lock is still used to ensure that all access to shared resources is synchronized.
public class CachedAsyncEnumerable<T> : IAsyncEnumerable<T>
{
private readonly object _locker = new object();
private IAsyncEnumerable<T> _source;
private Task _sourceEnumerationTask;
private List<T> _buffer;
private TaskCompletionSource<bool> _moveNextTCS;
private Exception _sourceEnumerationException;
private int _sourceEnumerationVersion; // Incremented on exception
public CachedAsyncEnumerable(IAsyncEnumerable<T> source)
{
_source = source ?? throw new ArgumentNullException(nameof(source));
}
public async IAsyncEnumerator<T> GetAsyncEnumerator(
CancellationToken cancellationToken = default)
{
lock (_locker)
{
if (_sourceEnumerationTask == null)
{
_buffer = new List<T>();
_moveNextTCS = new TaskCompletionSource<bool>();
_sourceEnumerationTask = Task.Run(
() => EnumerateSourceAsync(cancellationToken));
}
}
int index = 0;
int localVersion = -1;
while (true)
{
T current = default;
Task<bool> moveNextTask = null;
lock (_locker)
{
if (localVersion == -1)
{
localVersion = _sourceEnumerationVersion;
}
else if (_sourceEnumerationVersion != localVersion)
{
ExceptionDispatchInfo
.Capture(_sourceEnumerationException).Throw();
}
if (index < _buffer.Count)
{
current = _buffer[index];
index++;
}
else
{
moveNextTask = _moveNextTCS.Task;
}
}
if (moveNextTask == null)
{
yield return current;
continue;
}
var moved = await moveNextTask;
if (!moved) yield break;
lock (_locker)
{
current = _buffer[index];
index++;
}
yield return current;
}
}
private async Task EnumerateSourceAsync(CancellationToken cancellationToken)
{
TaskCompletionSource<bool> localMoveNextTCS;
try
{
await foreach (var item in _source.WithCancellation(cancellationToken))
{
lock (_locker)
{
_buffer.Add(item);
localMoveNextTCS = _moveNextTCS;
_moveNextTCS = new TaskCompletionSource<bool>();
}
localMoveNextTCS.SetResult(true);
}
lock (_locker)
{
localMoveNextTCS = _moveNextTCS;
_buffer.TrimExcess();
_source = null;
}
localMoveNextTCS.SetResult(false);
}
catch (Exception ex)
{
lock (_locker)
{
localMoveNextTCS = _moveNextTCS;
_sourceEnumerationException = ex;
_sourceEnumerationVersion++;
_sourceEnumerationTask = null;
}
localMoveNextTCS.SetException(ex);
}
}
}
This implementation follows a specific strategy for dealing with exceptions. If an exception occurs while enumerating the source IAsyncEnumerable, the exception will be propagated to all current callers, the currently used IAsyncEnumerator will be discarded, and the incomplete cached data will be discarded too. A new worker-task may start again later, when the next enumeration request is received.

The access to cache, yes it is thread safe, only one thread per time can read from _cache object.
But in that way you can't assure that all threads gets elements in the same order as they access to GetEnumerator.
Check these two exaples, if the behavior is what you expect, you can use lock in that way.
Example 1:
THREAD1 Calls GetEnumerator
THREAD1 Initialize T current;
THREAD2 Calls GetEnumerator
THREAD2 Initialize T current;
THREAD2 LOCK THREAD
THREAD1 WAIT
THREAD2 read from cache safely _cache[0]
THREAD2 index++
THREAD2 UNLOCK
THREAD1 LOCK
THREAD1 read from cache safely _cache[1]
THREAD1 i++
THREAD1 UNLOCK
THREAD2 yield return current;
THREAD1 yield return current;
Example 2:
THREAD2 Initialize T current;
THREAD2 LOCK THREAD
THREAD2 read from cache safely
THREAD2 UNLOCK
THREAD1 Initialize T current;
THREAD1 LOCK THREAD
THREAD1 read from cache safely
THREAD1 UNLOCK
THREAD1 yield return current;
THREAD2 yield return current;

Related

TPL DataFlow, link blocks with priority?

Using TPL.DataFlow blocks, is it possible to link two or more sources to a single ITargetBlock(e.g. ActionBlock) and prioritize the sources?
e.g.
BufferBlock<string> b1 = new ...
BufferBlock<string> b2 = new ...
ActionBlock<string> a = new ...
//somehow force messages in b1 to be processed before any message of b2, always
b1.LinkTo (a);
b2.LinkTo (a);
As long as there are messages in b1, I want those to be fed to "a" and once b1 is empty, b2 messages are beeing pushed into "a"
Ideas?
There is nothing like that in TPL Dataflow itself.
The simplest way I can imagine doing this by yourself would be to create a structure that encapsulates three blocks: high priority input, low priority input and output. Those blocks would be simple BufferBlocks, along with a method forwarding messages from the two inputs to the output based on priority, running in background.
The code could look like this:
public class PriorityBlock<T>
{
private readonly BufferBlock<T> highPriorityTarget;
public ITargetBlock<T> HighPriorityTarget
{
get { return highPriorityTarget; }
}
private readonly BufferBlock<T> lowPriorityTarget;
public ITargetBlock<T> LowPriorityTarget
{
get { return lowPriorityTarget; }
}
private readonly BufferBlock<T> source;
public ISourceBlock<T> Source
{
get { return source; }
}
public PriorityBlock()
{
var options = new DataflowBlockOptions { BoundedCapacity = 1 };
highPriorityTarget = new BufferBlock<T>(options);
lowPriorityTarget = new BufferBlock<T>(options);
source = new BufferBlock<T>(options);
Task.Run(() => ForwardMessages());
}
private async Task ForwardMessages()
{
while (true)
{
await Task.WhenAny(
highPriorityTarget.OutputAvailableAsync(),
lowPriorityTarget.OutputAvailableAsync());
T item;
if (highPriorityTarget.TryReceive(out item))
{
await source.SendAsync(item);
}
else if (lowPriorityTarget.TryReceive(out item))
{
await source.SendAsync(item);
}
else
{
// both input blocks must be completed
source.Complete();
return;
}
}
}
}
Usage would look like this:
b1.LinkTo(priorityBlock.HighPriorityTarget);
b2.LinkTo(priorityBlock.LowPriorityTarget);
priorityBlock.Source.LinkTo(a);
For this to work, a also has to have BoundingCapacity set to one (or at least a very low number).
The caveat with this code is that it can introduce latency of two messages (one waiting in the output block, one waiting in SendAsync()). So, if you have a long list of low priority messages and suddenly a high priority message comes in, it will be processed only after those two low-priority messages that are already waiting.
If this is a problem for you, it can be solved. But I believe it would require more complicated code, that deals with the less public parts of TPL Dataflow, like OfferMessage().
Here is an implementation of a PriorityBufferBlock<T> class, that propagates high priority items more frequently than low priority items. The constructor of this class has a priorityPrecedence parameter, that defines how many high priority items will be propagated for each low priority item. If this parameter has the value 1.0 (the smallest valid value), there is no real priority to speak of. If this parameter has the value Double.PositiveInfinity, no low priority item will ever be propagated as long as there are high priority items in the queue. If this parameter has a more normal value, like 5.0 for example, one low priority item will be propagated for every 5 high priority items.
This class maintains internally two queues, one for high and one for low priority items. The number of items stored in each queue is not taken into account, unless one of the two lists is empty, in which case all items of the other queue are freely propagated on demand. The priorityPrecedence parameter influences the behavior of the class only when both internal queues are non-empty. Otherwise, if only one queue has items, the PriorityBufferBlock<T> behaves like a normal BufferBlock<T>.
public class PriorityBufferBlock<T> : IPropagatorBlock<T, T>,
IReceivableSourceBlock<T>
{
private readonly IPropagatorBlock<T, int> _block;
private readonly Queue<T> _highQueue = new();
private readonly Queue<T> _lowQueue = new();
private readonly Predicate<T> _hasPriorityPredicate;
private readonly double _priorityPrecedence;
private double _priorityCounter = 0;
private object Locker => _highQueue;
public PriorityBufferBlock(Predicate<T> hasPriorityPredicate,
double priorityPrecedence,
DataflowBlockOptions dataflowBlockOptions = null)
{
ArgumentNullException.ThrowIfNull(hasPriorityPredicate);
if (priorityPrecedence < 1.0)
throw new ArgumentOutOfRangeException(nameof(priorityPrecedence));
_hasPriorityPredicate = hasPriorityPredicate;
_priorityPrecedence = priorityPrecedence;
dataflowBlockOptions ??= new();
_block = new TransformBlock<T, int>(item =>
{
bool hasPriority = _hasPriorityPredicate(item);
Queue<T> selectedQueue = hasPriority ? _highQueue : _lowQueue;
lock (Locker) selectedQueue.Enqueue(item);
return 0;
}, new()
{
BoundedCapacity = dataflowBlockOptions.BoundedCapacity,
CancellationToken = dataflowBlockOptions.CancellationToken,
MaxMessagesPerTask = dataflowBlockOptions.MaxMessagesPerTask
});
this.Completion = _block.Completion.ContinueWith(completion =>
{
Debug.Assert(this.Count == 0 || !completion.IsCompletedSuccessfully);
lock (Locker) { _highQueue.Clear(); _lowQueue.Clear(); }
return completion;
}, default, TaskContinuationOptions.ExecuteSynchronously |
TaskContinuationOptions.DenyChildAttach, TaskScheduler.Default).Unwrap();
}
public Task Completion { get; private init; }
public void Complete() => _block.Complete();
void IDataflowBlock.Fault(Exception exception) => _block.Fault(exception);
public int Count
{
get { lock (Locker) return _highQueue.Count + _lowQueue.Count; }
}
private Queue<T> GetSelectedQueue(bool forDequeue)
{
Debug.Assert(Monitor.IsEntered(Locker));
Queue<T> selectedQueue;
if (_highQueue.Count == 0)
selectedQueue = _lowQueue;
else if (_lowQueue.Count == 0)
selectedQueue = _highQueue;
else if (_priorityCounter + 1 > _priorityPrecedence)
selectedQueue = _lowQueue;
else
selectedQueue = _highQueue;
if (forDequeue)
{
if (_highQueue.Count == 0 || _lowQueue.Count == 0)
_priorityCounter = 0;
else if (++_priorityCounter > _priorityPrecedence)
_priorityCounter -= _priorityPrecedence + 1;
}
return selectedQueue;
}
private T Peek()
{
Debug.Assert(Monitor.IsEntered(Locker));
Debug.Assert(_highQueue.Count > 0 || _lowQueue.Count > 0);
return GetSelectedQueue(false).Peek();
}
private T Dequeue()
{
Debug.Assert(Monitor.IsEntered(Locker));
Debug.Assert(_highQueue.Count > 0 || _lowQueue.Count > 0);
return GetSelectedQueue(true).Dequeue();
}
private class TargetProxy : ITargetBlock<int>
{
private readonly PriorityBufferBlock<T> _parent;
private readonly ITargetBlock<T> _realTarget;
public TargetProxy(PriorityBufferBlock<T> parent, ITargetBlock<T> target)
{
Debug.Assert(parent is not null);
_parent = parent;
_realTarget = target ?? throw new ArgumentNullException(nameof(target));
}
public Task Completion => throw new NotSupportedException();
public void Complete() => _realTarget.Complete();
void IDataflowBlock.Fault(Exception error) => _realTarget.Fault(error);
DataflowMessageStatus ITargetBlock<int>.OfferMessage(
DataflowMessageHeader messageHeader, int messageValue,
ISourceBlock<int> source, bool consumeToAccept)
{
Debug.Assert(messageValue == 0);
if (consumeToAccept) throw new NotSupportedException();
lock (_parent.Locker)
{
T realValue = _parent.Peek();
DataflowMessageStatus response = _realTarget.OfferMessage(
messageHeader, realValue, _parent, consumeToAccept);
if (response == DataflowMessageStatus.Accepted) _parent.Dequeue();
return response;
}
}
}
public IDisposable LinkTo(ITargetBlock<T> target,
DataflowLinkOptions linkOptions)
=> _block.LinkTo(new TargetProxy(this, target), linkOptions);
DataflowMessageStatus ITargetBlock<T>.OfferMessage(
DataflowMessageHeader messageHeader, T messageValue,
ISourceBlock<T> source, bool consumeToAccept)
=> _block.OfferMessage(messageHeader,
messageValue, source, consumeToAccept);
T ISourceBlock<T>.ConsumeMessage(DataflowMessageHeader messageHeader,
ITargetBlock<T> target, out bool messageConsumed)
{
_ = _block.ConsumeMessage(messageHeader, new TargetProxy(this, target),
out messageConsumed);
if (messageConsumed) lock (Locker) return Dequeue();
return default;
}
bool ISourceBlock<T>.ReserveMessage(DataflowMessageHeader messageHeader,
ITargetBlock<T> target)
=> _block.ReserveMessage(messageHeader, new TargetProxy(this, target));
void ISourceBlock<T>.ReleaseReservation(DataflowMessageHeader messageHeader,
ITargetBlock<T> target)
=> _block.ReleaseReservation(messageHeader, new TargetProxy(this, target));
public bool TryReceive(Predicate<T> filter, out T item)
{
if (filter is not null) throw new NotSupportedException();
if (((IReceivableSourceBlock<int>)_block).TryReceive(null, out _))
{
lock (Locker) item = Dequeue(); return true;
}
item = default; return false;
}
public bool TryReceiveAll(out IList<T> items)
{
if (((IReceivableSourceBlock<int>)_block).TryReceiveAll(out IList<int> items2))
{
T[] array = new T[items2.Count];
lock (Locker)
for (int i = 0; i < array.Length; i++)
array[i] = Dequeue();
items = array; return true;
}
items = default; return false;
}
}
Usage example:
var bufferBlock = new PriorityBufferBlock<SaleOrder>(x => x.HasPriority, 2.5);
The above implementation supports all the features of the built-in BufferBlock<T>, except from the TryReceive with not-null filter. The core functionality of the block is delegated to an internal TransformBlock<T, int>, that contains a dummy zero value for every item stored in one of the queues.

IEnumerable Method with AsParallel

I got the following extension method:
static class ExtensionMethods
{
public static IEnumerable<IEnumerable<T>> Subsequencise<T>(
this IEnumerable<T> input,
int subsequenceLength)
{
var enumerator = input.GetEnumerator();
SubsequenciseParameter parameter = new SubsequenciseParameter
{
Next = enumerator.MoveNext()
};
while (parameter.Next)
yield return getSubSequence(
enumerator,
subsequenceLength,
parameter);
}
private static IEnumerable<T> getSubSequence<T>(
IEnumerator<T> enumerator,
int subsequenceLength,
SubsequenciseParameter parameter)
{
do
{
lock (enumerator) // this lock makes it "work"
{ // removing this causes exceptions.
if (parameter.Next)
yield return enumerator.Current;
}
} while ((parameter.Next = enumerator.MoveNext())
&& --subsequenceLength > 0);
}
// Needed since you cant use out or ref in yield-return methods...
class SubsequenciseParameter
{
public bool Next { get; set; }
}
}
Its purpose is to split a sequence into subsequences of a given size.
Calling it like this:
foreach (var sub in "abcdefghijklmnopqrstuvwxyz"
.Subsequencise(3)
.**AsParallel**()
.Select(sub =>new String(sub.ToArray()))
{
Console.WriteLine(sub);
}
Console.ReadKey();
works, however there are some empty lines in-between since some of the threads are "too late" and enter the first yield return.
I tried putting more locks everywhere, however I cannot achieve to make this work correct in combination with as parallel.
It's obvious that this example doesn't justify the use of as parallel at all. It is just to demonstrate how the method could be called.
The problem is that using iterators is lazy evaluated, so you return a lazily evaluated iterator which gets used from multiple threads.
You can fix this by rewriting your method as follows:
public static IEnumerable<IEnumerable<T>> Subsequencise<T>(this IEnumerable<T> input, int subsequenceLength)
{
var syncObj = new object();
var enumerator = input.GetEnumerator();
if (!enumerator.MoveNext())
{
yield break;
}
List<T> currentList = new List<T> { enumerator.Current };
int length = 1;
while (enumerator.MoveNext())
{
if (length == subsequenceLength)
{
length = 0;
yield return currentList;
currentList = new List<T>();
}
currentList.Add(enumerator.Current);
++length;
}
yield return currentList;
}
This performs the same function, but doesn't use an iterator to implement the "nested" IEnumerable<T>, avoiding the problem. Note that this also avoids the locking as well as the custom SubsequenciseParameter type.

Prioritized queues in Task Parallel Library

Is there any prior work of adding tasks to the TPL runtime with a varying priority?
If not, generally speaking, how would I implement this?
Ideally I plan on using the producer-consumer pattern to add "todo" work to the TPL. There may be times where I discover that a low priority job needs to be upgraded to a high priority job (relative to the others).
If anyone has some search keywords I should use when searching for this, please mention them, since I haven't yet found code that will do what I need.
So here is a rather naive concurrent implementation around a rather naive priority queue. The idea here is that there is a sorted set that holds onto pairs of both the real item and a priority, but is given a comparer that just compares the priority. The constructor takes a function that computes the priority for a given object.
As for actual implementation, they're not efficiently implemented, I just lock around everything. Creating more efficient implementations would prevent the use of SortedSet as a priority queue, and re-implementing one of those that can be effectively accessed concurrently is not going to be that easy.
In order to change the priority of an item you'll need to remove the item from the set and then add it again, and to find it without iterating the whole set you'd need to know the old priority as well as the new priority.
public class ConcurrentPriorityQueue<T> : IProducerConsumerCollection<T>
{
private object key = new object();
private SortedSet<Tuple<T, int>> set;
private Func<T, int> prioritySelector;
public ConcurrentPriorityQueue(Func<T, int> prioritySelector, IComparer<T> comparer = null)
{
this.prioritySelector = prioritySelector;
set = new SortedSet<Tuple<T, int>>(
new MyComparer<T>(comparer ?? Comparer<T>.Default));
}
private class MyComparer<T> : IComparer<Tuple<T, int>>
{
private IComparer<T> comparer;
public MyComparer(IComparer<T> comparer)
{
this.comparer = comparer;
}
public int Compare(Tuple<T, int> first, Tuple<T, int> second)
{
var returnValue = first.Item2.CompareTo(second.Item2);
if (returnValue == 0)
returnValue = comparer.Compare(first.Item1, second.Item1);
return returnValue;
}
}
public bool TryAdd(T item)
{
lock (key)
{
return set.Add(Tuple.Create(item, prioritySelector(item)));
}
}
public bool TryTake(out T item)
{
lock (key)
{
if (set.Count > 0)
{
var first = set.First();
item = first.Item1;
return set.Remove(first);
}
else
{
item = default(T);
return false;
}
}
}
public bool ChangePriority(T item, int oldPriority, int newPriority)
{
lock (key)
{
if (set.Remove(Tuple.Create(item, oldPriority)))
{
return set.Add(Tuple.Create(item, newPriority));
}
else
return false;
}
}
public bool ChangePriority(T item)
{
lock (key)
{
var result = set.FirstOrDefault(pair => object.Equals(pair.Item1, item));
if (object.Equals(result.Item1, item))
{
return ChangePriority(item, result.Item2, prioritySelector(item));
}
else
{
return false;
}
}
}
public void CopyTo(T[] array, int index)
{
lock (key)
{
foreach (var item in set.Select(pair => pair.Item1))
{
array[index++] = item;
}
}
}
public T[] ToArray()
{
lock (key)
{
return set.Select(pair => pair.Item1).ToArray();
}
}
public IEnumerator<T> GetEnumerator()
{
return ToArray().AsEnumerable().GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
public void CopyTo(Array array, int index)
{
lock (key)
{
foreach (var item in set.Select(pair => pair.Item1))
{
array.SetValue(item, index++);
}
}
}
public int Count
{
get { lock (key) { return set.Count; } }
}
public bool IsSynchronized
{
get { return true; }
}
public object SyncRoot
{
get { return key; }
}
}
Once you have an IProducerConsumerCollection<T> instance, which the above object is, you can use it as the internal backing object of a BlockingCollection<T> in order to have an easier to use user interface.
ParallelExtensionsExtras contains several custom TaskSchedulers that could be helpful either directly or as a base for your own scheduler.
Specifically, there are two schedulers that may be interesting for you:
QueuedTaskScheduler, which allows you to schedule Tasks at different priorities, but doesn't allow changing the priority of enqueued Tasks.
ReprioritizableTaskScheduler, which doesn't have different priorities, but allows you to move a specific Task to the front or to the back of the queue. (Though changing priority is O(n) in the number of currently waiting Tasks, which could be a problem if you had many Tasks at the same time.)

Concurrent collections and unique elements

I have a concurrent BlockingCollection with repeated elements. How can modify it to add or get distinct elements?
The default backing store for BlockingCollection is a ConcurrentQueue. As somebody else pointed out, it's rather difficult to add distinct items using that.
However, you could create your own collection type that implements IProducerConsumerCollection, and pass that to the BlockingCollection constructor.
Imagine a ConcurrentDictionary that contains the keys of the items that are currently in the queue. To add an item, you call TryAdd on the dictionary first, and if the item isn't in the dictionary you add it, and also add it to the queue. Take (and TryTake) get the next item from the queue, remove it from the dictionary, and return.
I'd prefer if there was a concurrent HashTable, but since there isn't one, you'll have to do with ConcurrentDictionary.
Here is an implementation of a IProducerConsumerCollection<T> collection with the behavior of a queue, that also rejects duplicate items:
public class ConcurrentQueueNoDuplicates<T> : IProducerConsumerCollection<T>
{
private readonly Queue<T> _queue = new();
private readonly HashSet<T> _set;
private object Locker => _queue;
public ConcurrentQueueNoDuplicates(IEqualityComparer<T> comparer = default)
{
_set = new(comparer);
}
public bool TryAdd(T item)
{
lock (Locker)
{
if (!_set.Add(item))
throw new DuplicateKeyException();
_queue.Enqueue(item); return true;
}
}
public bool TryTake(out T item)
{
lock (Locker)
{
if (_queue.Count == 0)
throw new InvalidOperationException();
item = _queue.Dequeue();
bool removed = _set.Remove(item);
Debug.Assert(removed);
return true;
}
}
public int Count { get { lock (Locker) return _queue.Count; } }
public bool IsSynchronized => false;
public object SyncRoot => throw new NotSupportedException();
public T[] ToArray() { lock (Locker) return _queue.ToArray(); }
public IEnumerator<T> GetEnumerator() => ToArray().AsEnumerable().GetEnumerator();
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
public void CopyTo(T[] array, int index) => throw new NotSupportedException();
public void CopyTo(Array array, int index) => throw new NotSupportedException();
}
public class DuplicateKeyException : InvalidOperationException { }
Usage example:
BlockingCollection<Item> queue = new(new ConcurrentQueueNoDuplicates<Item>());
//...
try { queue.Add(item); }
catch (DuplicateKeyException) { Console.WriteLine($"The {item} was rejected."); }
Caution: Calling queue.TryAdd(item); is not having the expected behavior of returning false if the item is a duplicate. Any attempt to add a duplicate item results invariably in a DuplicateKeyException. Do not attempt to "fix" the above ConcurrentQueueNoDuplicates<T>.TryAdd implementation, or the TryTake, by returning false. The BlockingCollection<T> will react by throwing a different exception (InvalidOperationException), and on top of that its internal state will become corrupted. There is currently (.NET 7) a bug that reduces by one the effective capacity of a BlockingCollection<T> whose underlying storage has a TryAdd implementation that returns false. The bug has been fixed for .NET 8, which will prevent the corruption, but it won't change the error-throwing behavior.

Is there a built-in way to convert IEnumerator to IEnumerable

Is there a built-in way to convert IEnumerator<T> to IEnumerable<T>?
The easiest way of converting I can think of is via the yield statement
public static IEnumerable<T> ToIEnumerable<T>(this IEnumerator<T> enumerator) {
while ( enumerator.MoveNext() ) {
yield return enumerator.Current;
}
}
compared to the list version this has the advantage of not enumerating the entire list before returning an IEnumerable. using the yield statement you'd only iterate over the items you need, whereas using the list version, you'd first iterate over all items in the list and then all the items you need.
for a little more fun you could change it to
public static IEnumerable<K> Select<K,T>(this IEnumerator<T> e,
Func<K,T> selector) {
while ( e.MoveNext() ) {
yield return selector(e.Current);
}
}
you'd then be able to use linq on your enumerator like:
IEnumerator<T> enumerator;
var someList = from item in enumerator
select new classThatTakesTInConstructor(item);
You could use the following which will kinda work.
public class FakeEnumerable<T> : IEnumerable<T> {
private IEnumerator<T> m_enumerator;
public FakeEnumerable(IEnumerator<T> e) {
m_enumerator = e;
}
public IEnumerator<T> GetEnumerator() {
return m_enumerator;
}
// Rest omitted
}
This will get you into trouble though when people expect successive calls to GetEnumerator to return different enumerators vs. the same one. But if it's a one time only use in a very constrained scenario, this could unblock you.
I do suggest though you try and not do this because I think eventually it will come back to haunt you.
A safer option is along the lines Jonathan suggested. You can expend the enumerator and create a List<T> of the remaining items.
public static List<T> SaveRest<T>(this IEnumerator<T> e) {
var list = new List<T>();
while ( e.MoveNext() ) {
list.Add(e.Current);
}
return list;
}
EnumeratorEnumerable<T>
A threadsafe, resettable adaptor from IEnumerator<T> to IEnumerable<T>
I use Enumerator parameters like in C++ forward_iterator concept.
I agree that this can lead to confusion as too many people will indeed assume Enumerators are /like/ Enumerables, but they are not.
However, the confusion is fed by the fact that IEnumerator contains the Reset method. Here is my idea of the most correct implementation. It leverages the implementation of IEnumerator.Reset()
A major difference between an Enumerable and and Enumerator is, that an Enumerable might be able to create several Enumerators simultaneously. This implementation puts a whole lot of work into making sure that this never happens for the EnumeratorEnumerable<T> type. There are two EnumeratorEnumerableModes:
Blocking (meaning that a second caller will simply wait till the first enumeration is completed)
NonBlocking (meaning that a second (concurrent) request for an enumerator simply throws an exception)
Note 1: 74 lines are implementation, 79 lines are testing code :)
Note 2: I didn't refer to any unit testing framework for SO convenience
using System;
using System.Diagnostics;
using System.Linq;
using System.Collections;
using System.Collections.Generic;
using System.Threading;
namespace EnumeratorTests
{
public enum EnumeratorEnumerableMode
{
NonBlocking,
Blocking,
}
public sealed class EnumeratorEnumerable<T> : IEnumerable<T>
{
#region LockingEnumWrapper
public sealed class LockingEnumWrapper : IEnumerator<T>
{
private static readonly HashSet<IEnumerator<T>> BusyTable = new HashSet<IEnumerator<T>>();
private readonly IEnumerator<T> _wrap;
internal LockingEnumWrapper(IEnumerator<T> wrap, EnumeratorEnumerableMode allowBlocking)
{
_wrap = wrap;
if (allowBlocking == EnumeratorEnumerableMode.Blocking)
Monitor.Enter(_wrap);
else if (!Monitor.TryEnter(_wrap))
throw new InvalidOperationException("Thread conflict accessing busy Enumerator") {Source = "LockingEnumWrapper"};
lock (BusyTable)
{
if (BusyTable.Contains(_wrap))
throw new LockRecursionException("Self lock (deadlock) conflict accessing busy Enumerator") { Source = "LockingEnumWrapper" };
BusyTable.Add(_wrap);
}
// always implicit Reset
_wrap.Reset();
}
#region Implementation of IDisposable and IEnumerator
public void Dispose()
{
lock (BusyTable)
BusyTable.Remove(_wrap);
Monitor.Exit(_wrap);
}
public bool MoveNext() { return _wrap.MoveNext(); }
public void Reset() { _wrap.Reset(); }
public T Current { get { return _wrap.Current; } }
object IEnumerator.Current { get { return Current; } }
#endregion
}
#endregion
private readonly IEnumerator<T> _enumerator;
private readonly EnumeratorEnumerableMode _allowBlocking;
public EnumeratorEnumerable(IEnumerator<T> e, EnumeratorEnumerableMode allowBlocking)
{
_enumerator = e;
_allowBlocking = allowBlocking;
}
private LockRecursionPolicy a;
public IEnumerator<T> GetEnumerator()
{
return new LockingEnumWrapper(_enumerator, _allowBlocking);
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
class TestClass
{
private static readonly string World = "hello world\n";
public static void Main(string[] args)
{
var master = World.GetEnumerator();
var nonblocking = new EnumeratorEnumerable<char>(master, EnumeratorEnumerableMode.NonBlocking);
var blocking = new EnumeratorEnumerable<char>(master, EnumeratorEnumerableMode.Blocking);
foreach (var c in nonblocking) Console.Write(c); // OK (implicit Reset())
foreach (var c in blocking) Console.Write(c); // OK (implicit Reset())
foreach (var c in nonblocking) Console.Write(c); // OK (implicit Reset())
foreach (var c in blocking) Console.Write(c); // OK (implicit Reset())
try
{
var willRaiseException = from c1 in nonblocking from c2 in nonblocking select new {c1, c2};
Console.WriteLine("Cartesian product: {0}", willRaiseException.Count()); // RAISE
}
catch (Exception e) { Console.WriteLine(e); }
foreach (var c in nonblocking) Console.Write(c); // OK (implicit Reset())
foreach (var c in blocking) Console.Write(c); // OK (implicit Reset())
try
{
var willSelfLock = from c1 in blocking from c2 in blocking select new { c1, c2 };
Console.WriteLine("Cartesian product: {0}", willSelfLock.Count()); // LOCK
}
catch (Exception e) { Console.WriteLine(e); }
// should not externally throw (exceptions on other threads reported to console)
if (ThreadConflictCombinations(blocking, nonblocking))
throw new InvalidOperationException("Should have thrown an exception on background thread");
if (ThreadConflictCombinations(nonblocking, nonblocking))
throw new InvalidOperationException("Should have thrown an exception on background thread");
if (ThreadConflictCombinations(nonblocking, blocking))
Console.WriteLine("Background thread timed out");
if (ThreadConflictCombinations(blocking, blocking))
Console.WriteLine("Background thread timed out");
Debug.Assert(true); // Must be reached
}
private static bool ThreadConflictCombinations(IEnumerable<char> main, IEnumerable<char> other)
{
try
{
using (main.GetEnumerator())
{
var bg = new Thread(o =>
{
try { other.GetEnumerator(); }
catch (Exception e) { Report(e); }
}) { Name = "background" };
bg.Start();
bool timedOut = !bg.Join(1000); // observe the thread waiting a full second for a lock (or throw the exception for nonblocking)
if (timedOut)
bg.Abort();
return timedOut;
}
} catch
{
throw new InvalidProgramException("Cannot be reached");
}
}
static private readonly object ConsoleSynch = new Object();
private static void Report(Exception e)
{
lock (ConsoleSynch)
Console.WriteLine("Thread:{0}\tException:{1}", Thread.CurrentThread.Name, e);
}
}
}
Note 3: I think the implementation of the thread locking (especially around BusyTable) is quite ugly; However, I didn't want to resort to ReaderWriterLock(LockRecursionPolicy.NoRecursion) and didn't want to assume .Net 4.0 for SpinLock
Solution with use of Factory along with fixing cached IEnumerator issue in JaredPar's answer allows to change the way of enumeration.
Consider a simple example: we want custom List<T> wrapper that allow to enumerate in reverse order along with default enumeration. List<T> already implements IEnumerator for default enumeration, we only need to create IEnumerator that enumerates in reverse order. (We won't use List<T>.AsEnumerable().Reverse() because it enumerates the list twice)
public enum EnumerationType {
Default = 0,
Reverse
}
public class CustomList<T> : IEnumerable<T> {
private readonly List<T> list;
public CustomList(IEnumerable<T> list) => this.list = new List<T>(list);
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
//Default IEnumerable method will return default enumerator factory
public IEnumerator<T> GetEnumerator()
=> GetEnumerable(EnumerationType.Default).GetEnumerator();
public IEnumerable<T> GetEnumerable(EnumerationType enumerationType)
=> enumerationType switch {
EnumerationType.Default => new DefaultEnumeratorFactory(list),
EnumerationType.Reverse => new ReverseEnumeratorFactory(list)
};
//Simple implementation of reverse list enumerator
private class ReverseEnumerator : IEnumerator<T> {
private readonly List<T> list;
private int index;
internal ReverseEnumerator(List<T> list) {
this.list = list;
index = list.Count-1;
Current = default;
}
public void Dispose() { }
public bool MoveNext() {
if(index >= 0) {
Current = list[index];
index--;
return true;
}
Current = default;
return false;
}
public T Current { get; private set; }
object IEnumerator.Current => Current;
void IEnumerator.Reset() {
index = list.Count - 1;
Current = default;
}
}
private abstract class EnumeratorFactory : IEnumerable<T> {
protected readonly List<T> List;
protected EnumeratorFactory(List<T> list) => List = list;
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
public abstract IEnumerator<T> GetEnumerator();
}
private class DefaultEnumeratorFactory : EnumeratorFactory {
public DefaultEnumeratorFactory(List<T> list) : base(list) { }
//Default enumerator is already implemented in List<T>
public override IEnumerator<T> GetEnumerator() => List.GetEnumerator();
}
private class ReverseEnumeratorFactory : EnumeratorFactory {
public ReverseEnumeratorFactory(List<T> list) : base(list) { }
public override IEnumerator<T> GetEnumerator() => new ReverseEnumerator(List);
}
}
As Jason Watts said -- no, not directly.
If you really want to, you could loop through the IEnumerator<T>, putting the items into a List<T>, and return that, but I'm guessing that's not what you're looking to do.
The basic reason you can't go that direction (IEnumerator<T> to a IEnumerable<T>) is that IEnumerable<T> represents a set that can be enumerated, but IEnumerator<T> is a specific enumeratation over a set of items -- you can't turn the specific instance back into the thing that created it.
static class Helper
{
public static List<T> SaveRest<T>(this IEnumerator<T> enumerator)
{
var list = new List<T>();
while (enumerator.MoveNext())
{
list.Add(enumerator.Current);
}
return list;
}
public static ArrayList SaveRest(this IEnumerator enumerator)
{
var list = new ArrayList();
while (enumerator.MoveNext())
{
list.Add(enumerator.Current);
}
return list;
}
}
Nope, IEnumerator<> and IEnumerable<> are different beasts entirely.
This is a variant I have written... The specific is a little different. I wanted to do a MoveNext() on an IEnumerable<T>, check the result, and then roll everything in a new IEnumerator<T> that was "complete" (so that included even the element of the IEnumerable<T> I had already extracted)
// Simple IEnumerable<T> that "uses" an IEnumerator<T> that has
// already received a MoveNext(). "eats" the first MoveNext()
// received, then continues normally. For shortness, both IEnumerable<T>
// and IEnumerator<T> are implemented by the same class. Note that if a
// second call to GetEnumerator() is done, the "real" IEnumerator<T> will
// be returned, not this proxy implementation.
public class EnumerableFromStartedEnumerator<T> : IEnumerable<T>, IEnumerator<T>
{
public readonly IEnumerator<T> Enumerator;
public readonly IEnumerable<T> Enumerable;
// Received by creator. Return value of MoveNext() done by caller
protected bool FirstMoveNextSuccessful { get; set; }
// The Enumerator can be "used" only once, then a new enumerator
// can be requested by Enumerable.GetEnumerator()
// (default = false)
protected bool Used { get; set; }
// The first MoveNext() has been already done (default = false)
protected bool DoneMoveNext { get; set; }
public EnumerableFromStartedEnumerator(IEnumerator<T> enumerator, bool firstMoveNextSuccessful, IEnumerable<T> enumerable)
{
Enumerator = enumerator;
FirstMoveNextSuccessful = firstMoveNextSuccessful;
Enumerable = enumerable;
}
public IEnumerator<T> GetEnumerator()
{
if (Used)
{
return Enumerable.GetEnumerator();
}
Used = true;
return this;
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
public T Current
{
get
{
// There are various school of though on what should
// happens if called before the first MoveNext() or
// after a MoveNext() returns false. We follow the
// "return default(TInner)" school of thought for the
// before first MoveNext() and the "whatever the
// Enumerator wants" for the after a MoveNext() returns
// false
if (!DoneMoveNext)
{
return default(T);
}
return Enumerator.Current;
}
}
public void Dispose()
{
Enumerator.Dispose();
}
object IEnumerator.Current
{
get
{
return Current;
}
}
public bool MoveNext()
{
if (!DoneMoveNext)
{
DoneMoveNext = true;
return FirstMoveNextSuccessful;
}
return Enumerator.MoveNext();
}
public void Reset()
{
// This will 99% throw :-) Not our problem.
Enumerator.Reset();
// So it is improbable we will arrive here
DoneMoveNext = true;
}
}
Use:
var enumerable = someCollection<T>;
var enumerator = enumerable.GetEnumerator();
bool res = enumerator.MoveNext();
// do whatever you want with res/enumerator.Current
var enumerable2 = new EnumerableFromStartedEnumerator<T>(enumerator, res, enumerable);
Now, the first GetEnumerator() that will be requested to enumerable2 will be given through the enumerator enumerator. From the second onward the enumerable.GetEnumerator() will be used.
The other answers here are ... strange. IEnumerable<T> has just one method, GetEnumerator(). And an IEnumerable<T> must implement IEnumerable, which also has just one method, GetEnumerator() (the difference being that one is generic on T and the other is not). So it should be clear how to turn an IEnumerator<T> into an IEnumerable<T>:
// using modern expression-body syntax
public class IEnumeratorToIEnumerable<T> : IEnumerable<T>
{
private readonly IEnumerator<T> Enumerator;
public IEnumeratorToIEnumerable(IEnumerator<T> enumerator) =>
Enumerator = enumerator;
public IEnumerator<T> GetEnumerator() => Enumerator;
IEnumerator IEnumerable.GetEnumerator() => Enumerator;
}
foreach (var foo in new IEnumeratorToIEnumerable<Foo>(fooEnumerator))
DoSomethingWith(foo);
// and you can also do:
var fooEnumerable = new IEnumeratorToIEnumerable<Foo>(fooEnumerator);
foreach (var foo in fooEnumerable)
DoSomethingWith(foo);
// Some IEnumerators automatically repeat after MoveNext() returns false,
// in which case this is a no-op, but generally it's required.
fooEnumerator.Reset();
foreach (var foo in fooEnumerable)
DoSomethingElseWith(foo);
However, none of this should be needed because it's unusual to have an IEnumerator<T> that doesn't come with an IEnumerable<T> that returns an instance of it from its GetEnumerator method. If you're writing your own IEnumerator<T>, you should certainly provide the IEnumerable<T>. And really it's the other way around ... an IEnumerator<T> is intended to be a private class that iterates over instances of a public class that implements IEnumerable<T>.

Categories

Resources