Concurrent collections and unique elements - c#

I have a concurrent BlockingCollection with repeated elements. How can modify it to add or get distinct elements?

The default backing store for BlockingCollection is a ConcurrentQueue. As somebody else pointed out, it's rather difficult to add distinct items using that.
However, you could create your own collection type that implements IProducerConsumerCollection, and pass that to the BlockingCollection constructor.
Imagine a ConcurrentDictionary that contains the keys of the items that are currently in the queue. To add an item, you call TryAdd on the dictionary first, and if the item isn't in the dictionary you add it, and also add it to the queue. Take (and TryTake) get the next item from the queue, remove it from the dictionary, and return.
I'd prefer if there was a concurrent HashTable, but since there isn't one, you'll have to do with ConcurrentDictionary.

Here is an implementation of a IProducerConsumerCollection<T> collection with the behavior of a queue, that also rejects duplicate items:
public class ConcurrentQueueNoDuplicates<T> : IProducerConsumerCollection<T>
{
private readonly Queue<T> _queue = new();
private readonly HashSet<T> _set;
private object Locker => _queue;
public ConcurrentQueueNoDuplicates(IEqualityComparer<T> comparer = default)
{
_set = new(comparer);
}
public bool TryAdd(T item)
{
lock (Locker)
{
if (!_set.Add(item))
throw new DuplicateKeyException();
_queue.Enqueue(item); return true;
}
}
public bool TryTake(out T item)
{
lock (Locker)
{
if (_queue.Count == 0)
throw new InvalidOperationException();
item = _queue.Dequeue();
bool removed = _set.Remove(item);
Debug.Assert(removed);
return true;
}
}
public int Count { get { lock (Locker) return _queue.Count; } }
public bool IsSynchronized => false;
public object SyncRoot => throw new NotSupportedException();
public T[] ToArray() { lock (Locker) return _queue.ToArray(); }
public IEnumerator<T> GetEnumerator() => ToArray().AsEnumerable().GetEnumerator();
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
public void CopyTo(T[] array, int index) => throw new NotSupportedException();
public void CopyTo(Array array, int index) => throw new NotSupportedException();
}
public class DuplicateKeyException : InvalidOperationException { }
Usage example:
BlockingCollection<Item> queue = new(new ConcurrentQueueNoDuplicates<Item>());
//...
try { queue.Add(item); }
catch (DuplicateKeyException) { Console.WriteLine($"The {item} was rejected."); }
Caution: Calling queue.TryAdd(item); is not having the expected behavior of returning false if the item is a duplicate. Any attempt to add a duplicate item results invariably in a DuplicateKeyException. Do not attempt to "fix" the above ConcurrentQueueNoDuplicates<T>.TryAdd implementation, or the TryTake, by returning false. The BlockingCollection<T> will react by throwing a different exception (InvalidOperationException), and on top of that its internal state will become corrupted. There is currently (.NET 7) a bug that reduces by one the effective capacity of a BlockingCollection<T> whose underlying storage has a TryAdd implementation that returns false. The bug has been fixed for .NET 8, which will prevent the corruption, but it won't change the error-throwing behavior.

Related

Concurrent collection supporting random (FIFO) and specific Remove

I'm writing an application which manages a collection that requires frequent enqueuing and dequeuing of items in a miltithreaded environment. With single threaded, a simple List is probably enough, but concurrent nature of the environment poses some issues.
Here's the summary:
The structure needs to have a bool TryAdd(T) method, preferrably Add(TKey, TValue);
The structure needs to have a T TryRemove() method which takes a random or preferrably the first added item (essentially implementing a FIFO queue);
The structure needs to have a bool TryRemove(T) method, preferrably Remove(TKey);
So far I have three ideas, all with their issues:
Implement a class containing a ConcurrentDictionary<TKey, TValue> and a ConcurrentQueue like this:
internal class ConcurrentQueuedDictionary<TKey, TValue> where TKey : notnull
{
ConcurrentDictionary<TKey, TValue> _dictionary;
ConcurrentQueue<TKey> _queue;
object _locker;
public bool TryAdd(TKey key, TValue value)
{
if (!_dictionary.TryAdd(key, value))
return false;
lock (_locker)
_queue.Enqueue(key);
return true;
}
public TValue TryRemove()
{
TKey key;
lock (_locker) {
if (_queue.IsEmpty)
return default(TValue);
_queue.TryDequeue(out key);
}
TValue value;
if (!_dictionary.Remove(key, out value))
throw new Exception();
return value;
}
public bool TryRemove(TKey key)
{
lock (_locker)
{
var copiedList = _queue.ToList();
if (copiedList.Remove(key))
return false;
_queue = new(copiedList);
}
return _dictionary.TryRemove(key, out _);
}
}
but that will require a Lock on Remove(T) because it demands a full deep copy of the initial Queue without the removed item while disallowing read from other threads, which means that at least Remove() will also have this lock, and this is meant to be an operation carried out often;
Implement a class containing a ConcurrentDictionary<TKey, TValue> and a ConcurrentDictionary<int order, TKey>, where order is defined on TryAdd with two properties _addOrder and _removeOrder like this:
internal class ConcurrentQueuedDictionary<TKey, TValue> where TKey : notnull
{
ConcurrentDictionary<TKey, TValue> _dictionary;
ConcurrentDictionary<int, TKey> _order;
int _addOrder = 0;
int _removeOrder = 0;
public bool TryAdd(TKey key, TValue value)
{
if (!_dictionary.TryAdd(key, value))
return false;
if (!_order.TryAdd(unchecked(Interlocked.Increment(ref _addOrder)), key))
throw new Exception(); //Operation faulted, mismatch of data in _order
return true;
}
public TValue TryRemove()
{
TKey key;
if (!(_order.Count > 0 && _order.Remove(unchecked(Interlocked.Increment(ref _removeOrder)), out key)))
return default(TValue);
return _dictionary[key];
}
public bool TryRemove(TKey key)
{
if (!_order.Remove(_order.Where(item => item.Value.Equals(key)).First().Key, out _))
return false;
if (!_dictionary.Remove(key, out _))
throw new Exception();
return true;
}
}
but I'm pretty sure just voicing this implementation had put me on a psychiatric watchlist somewhere because it's gonna be a masochistic nightmare to make work properly;
Straight up locking a List because locks are necessary for option 1 anyway.
Any ideas? I'm kinda stumped by this issue as I don't have the best grasp on concurrent collections. Do I need a custom IProducerConsumerCollection? Is it even possible to have both random (or queued) and specific access to concurrent collection elements? Have any of you faced this before, maybe I'm looking at the issue wrong?
Edit: typos, formatting
Creating a concurrent structure like this by combining built-in concurrent collections should be close to impossible, provided of course that correctness is paramount and race-conditions are strictly forbidden. The good news is that acquiring a lock a few thousands times per second is nowhere near the limit where contention starts to become an issue, provided that the operations inside the protected region are lightweight (their duration is measured in nanoseconds).
One way to achieve O(1) complexity of operations, is to combine a LinkedList<T> and a Dictionary<K,V>:
/// <summary>
/// Represents a thread-safe first in-first out (FIFO) collection of key/value pairs,
/// where the key is unique.
/// </summary>
public class ConcurrentKeyedQueue<TKey, TValue>
{
private readonly LinkedList<KeyValuePair<TKey, TValue>> _queue;
private readonly Dictionary<TKey, LinkedListNode<KeyValuePair<TKey, TValue>>>
_dictionary;
public ConcurrentKeyedQueue(IEqualityComparer<TKey> comparer = default)
{
_queue = new();
_dictionary = new(comparer);
}
public int Count { get { lock (_queue) return _queue.Count; } }
public bool TryEnqueue(TKey key, TValue value)
{
lock (_queue)
{
ref var node = ref CollectionsMarshal
.GetValueRefOrAddDefault(_dictionary, key, out bool exists);
if (exists) return false;
node = new(new(key, value));
_queue.AddLast(node);
Debug.Assert(_queue.Count == _dictionary.Count);
return true;
}
}
public bool TryDequeue(out TKey key, out TValue value)
{
lock (_queue)
{
if (_queue.Count == 0) { key = default; value = default; return false; }
var node = _queue.First;
(key, value) = node.Value;
_queue.RemoveFirst();
bool removed = _dictionary.Remove(key);
Debug.Assert(removed);
Debug.Assert(_queue.Count == _dictionary.Count);
return true;
}
}
public bool TryTake(TKey key, out TValue value)
{
lock (_queue)
{
bool removed = _dictionary.Remove(key, out var node);
if (!removed) { value = default; return false; }
_queue.Remove(node);
(_, value) = node.Value;
Debug.Assert(_queue.Count == _dictionary.Count);
return true;
}
}
public KeyValuePair<TKey, TValue>[] ToArray()
{
lock (_queue) return _queue.ToArray();
}
}
This combination is also used for creating LRU caches.
You can measure the lock contention in your own environment under load, by using the Monitor.LockContentionCount property: "Gets the number of times there was contention when trying to take the monitor's lock." If you see the delta per second to be a single digit number, there is nothing to worry about.
For a version that doesn't use the CollectionsMarshal.GetValueRefOrAddDefault method, and so it can be used on .NET versions older than .NET 6, see the first revision of this answer.

Best way to implement consumer queue that you can remove items from sequentially (.net 6)

new poster here so I hope this makes sense ...
I need to create a collection that I can remove items from in sequence (basically stock market time series data).
The data producer is multi-threaded and doesn't guarantee that the data will come in sequence.
I've looked all around for a solution but the only thing I can come up with is to create my own custom dictionary, using ConcurrentDictionary and implementing the IProducerConsumer interface so it can be used with with BlockingCollection.
The code I have below does work, but produces an error
System.InvalidOperationException: The underlying collection was
modified from outside of the BlockingCollection
when using the GetConsumingEnumerable() for loop, and the next key in the sequence is not present in the dictionary. In this instance I would like to wait for a specified amount of time
and then attempt to take the item from the queue again.
My questions is:
What's the best way to handle the error when there is no key present. At the moment it seems handling the error would require exiting the loop. Perhaps using GetConsumingEnumerable() is not the right way to consume and a while loop would work better?
Code is below - any help/ideas much appreciated.
IProducerConsumer implementation:
public abstract class BlockingDictionary<TKey, TValue> : IProducerConsumerCollection<KeyValuePair<TKey, TValue>> where TKey : notnull
{
protected ConcurrentDictionary<TKey, TValue> _dictionary = new ConcurrentDictionary<TKey, TValue>();
int ICollection.Count => _dictionary.Count;
bool ICollection.IsSynchronized => false;
object ICollection.SyncRoot => throw new NotSupportedException();
public void CopyTo(KeyValuePair<TKey, TValue>[] array, int index)
{
if (array == null)
{
throw new ArgumentNullException("array");
}
_dictionary.ToList().CopyTo(array, index);
}
void ICollection.CopyTo(Array array, int index)
{
if (array == null)
{
throw new ArgumentNullException("array");
}
((ICollection)_dictionary.ToList()).CopyTo(array, index);
}
public IEnumerator<KeyValuePair<TKey, TValue>> GetEnumerator()
{
return ((IEnumerable<KeyValuePair<TKey, TValue>>)_dictionary).GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return ((IEnumerable<KeyValuePair<TKey, TValue>>)this).GetEnumerator();
}
public KeyValuePair<TKey, TValue>[] ToArray()
{
return _dictionary.ToList().ToArray();
}
bool IProducerConsumerCollection<KeyValuePair<TKey, TValue>>.TryAdd(KeyValuePair<TKey, TValue> item)
{
return _dictionary.TryAdd(item.Key, item.Value);
}
public virtual bool TryTake(out KeyValuePair<TKey, TValue> item)
{
item = this.FirstOrDefault();
TValue? value;
return _dictionary.TryRemove(item.Key, out value);
}
}
Time Sequence queue implementation (inherits above)
public class TimeSequenceQueue<T> : BlockingDictionary<DateTime, T>
{
private DateTime _previousTime;
private DateTime _nextTime;
private readonly int _intervalSeconds;
public TimeSequenceQueue(DateTime startTime, int intervalSeconds)
{
_intervalSeconds = intervalSeconds;
_previousTime = startTime;
_nextTime = startTime;
}
public override bool TryTake([MaybeNullWhen(false)] out KeyValuePair<DateTime, T> item)
{
item = _dictionary.SingleOrDefault(x => x.Key == _nextTime);
T? value = default(T);
if (item.Value == null)
return false;
bool result = _dictionary.TryRemove(item.Key, out value);
if (result)
{
_previousTime = _nextTime;
_nextTime = _nextTime.AddSeconds(_intervalSeconds);
}
return result;
}
}
Usage:
BlockingCollection<KeyValuePair<DateTime, object>> _queue = new BlockingCollection<KeyValuePair<DateTime, object>>(new TimeSequenceQueue<object>());
Consuming loop - started in new thread:
foreach (var item in _queue.GetConsumingEnumerable())
{
// feed downstream
}
When using the GetConsumingEnumerable() for loop, and the next key in the sequence is not present in the dictionary [...] I would like to wait for a specified amount of time and then attempt to take the item from the queue again.
I will try to answer this question generally, without paying too much attention to the specifics of your problem. So let's say that you are consuming
a BlockingCollection<T> like this:
foreach (var item in collection.GetConsumingEnumerable())
{
// Do something with the consumed item.
}
...and you want to avoid waiting indefinitely for an item to arrive. You want to wake up every 5 seconds and do something, before waiting/sleeping again.
Here is how you could do it:
while (!collection.IsCompleted)
{
bool consumed = collection.TryTake(out var item, TimeSpan.FromSeconds(5));
if (consumed)
{
// Do something with the consumed item.
}
else
{
// Do something before trying again to take an item.
}
}
The above pattern imitates the actual source code of the BlockingCollection<T>.GetConsumingEnumerable method.
If you want to get fancy you could incorporate this functionality in a custom extension method for the BlockingCollection<T> class, like this:
public static IEnumerable<(bool Consumed, T Item)> GetConsumingEnumerable<T>(
this BlockingCollection<T> source, TimeSpan timeout)
{
while (!source.IsCompleted)
{
bool consumed = source.TryTake(out var item, timeout);
yield return (consumed, item);
}
}
Usage example:
foreach (var (consumed, item) in collection.GetConsumingEnumerable(
TimeSpan.FromSeconds(5)))
{
// Do something depending on whether an item was consumed or not.
}

In C#, is there a queue which can only hold an object once in its lifetime?

I need a datastructure, which is a special type of queue. I want that, if an instance of my queue ever contained an object X, it shouldn't be possible to enqueue X again in this instance. The enqueuing method should just do nothing if called with X, like the attempt to add a duplicate value to a HashSet.
Example usage:
MyQueue<int> queue = new MyQueue<int>();
queue.Enqueue(5);
queue.Enqueue(17);
queue.Enqueue(28);
queue.Enqueue(17);
int firstNumber = queue.Dequeue();
queue.Enqueue(5);
queue.Enqueue(3);
List<int> queueContents = queue.ToList(); //this list should contain {17, 28, 3}
I looked around on MSDN, but couldn't find such a class. Does it exist, or do I have to implement it myself?
I guess I could use a different data structure too, but access will always be FIFO, so I thought a queue will be most efficient. Also, I don't know of any other structure which provides such "uniqueness over instance lifetime" feature.
I would do something similar to this:
class UniqueQueue<T>
{
private readonly Queue<T> queue = new Queue<T>();
private HashSet<T> alreadyAdded = new HashSet<T>();
public virtual void Enqueue(T item)
{
if (alreadyAdded.Add(item)) { queue.Enqueue(item); }
}
public int Count { get { return queue.Count; } }
public virtual T Dequeue()
{
T item = queue.Dequeue();
return item;
}
}
Note, most of this code was borrowed from This Thread.
You'd have to implement that yourself.
One idea is just to add the element to a HashSet when you enqueue it.
Then, when you want to enqueue, just check the HashSet for the item, if it exists, don't enqueue.
Since you want to prevent enqueuing for the rest of the queue's lifetime, you probably won't want to ever remove from the HashSet.
This is just a extended version of wayne's answer, it is just a little more fleshed out and having a few more interfaces supported. (To mimic Queue<T>'s interfaces)
sealed class UniqueQueue<T> : IEnumerable<T>, ICollection, IEnumerable
{
private readonly Queue<T> queue;
private readonly HashSet<T> alreadyAdded;
public UniqueQueue(IEqualityComparer<T> comparer)
{
queue = new Queue<T>();
alreadyAdded = new HashSet<T>(comparer);
}
public UniqueQueue(IEnumerable<T> collection, IEqualityComparer<T> comparer)
{
//Do this so the enumeration does not happen twice in case the enumerator behaves differently each enumeration.
var localCopy = collection.ToList();
queue = new Queue<T>(localCopy);
alreadyAdded = new HashSet<T>(localCopy, comparer);
}
public UniqueQueue(int capacity, IEqualityComparer<T> comparer)
{
queue = new Queue<T>(capacity);
alreadyAdded = new HashSet<T>(comparer);
}
//Here are the constructors that use the default comparer. By passing null in for the comparer it will just use the default one for the type.
public UniqueQueue() : this((IEqualityComparer<T>) null) { }
public UniqueQueue(IEnumerable<T> collection) : this(collection, null) { }
public UniqueQueue(int capacity) : this(capacity, null) { }
/// <summary>
/// Attempts to enqueue a object, returns false if the object was ever added to the queue in the past.
/// </summary>
/// <param name="item">The item to enqueue</param>
/// <returns>True if the object was successfully added, false if it was not</returns>
public bool Enqueue(T item)
{
if (!alreadyAdded.Add(item))
return false;
queue.Enqueue(item);
return true;
}
public int Count
{
get { return queue.Count; }
}
public T Dequeue()
{
return queue.Dequeue();
}
IEnumerator<T> IEnumerable<T>.GetEnumerator()
{
return ((IEnumerable<T>)queue).GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return ((IEnumerable)queue).GetEnumerator();
}
void ICollection.CopyTo(Array array, int index)
{
((ICollection)queue).CopyTo(array, index);
}
bool ICollection.IsSynchronized
{
get { return ((ICollection)queue).IsSynchronized; }
}
object ICollection.SyncRoot
{
get { return ((ICollection)queue).SyncRoot; }
}
}
You can use a basic queue, but modify the Enqueue method to verify the previous values entered. Here, I used a hashset to contain those previous values :
public class UniqueValueQueue<T> : Queue<T>
{
private readonly HashSet<T> pastValues = new HashSet<T>();
public new void Enqueue(T item)
{
if (!pastValues.Contains(item))
{
pastValues.Add(item);
base.Enqueue(item);
}
}
}
With your test case
UniqueValueQueue<int> queue = new UniqueValueQueue<int>();
queue.Enqueue(5);
queue.Enqueue(17);
queue.Enqueue(28);
queue.Enqueue(17);
int firstNumber = queue.Dequeue();
queue.Enqueue(5);
queue.Enqueue(3);
List<int> queueContents = queue.ToList();
queueContents contains 17, 28 and 3.

Prioritized queues in Task Parallel Library

Is there any prior work of adding tasks to the TPL runtime with a varying priority?
If not, generally speaking, how would I implement this?
Ideally I plan on using the producer-consumer pattern to add "todo" work to the TPL. There may be times where I discover that a low priority job needs to be upgraded to a high priority job (relative to the others).
If anyone has some search keywords I should use when searching for this, please mention them, since I haven't yet found code that will do what I need.
So here is a rather naive concurrent implementation around a rather naive priority queue. The idea here is that there is a sorted set that holds onto pairs of both the real item and a priority, but is given a comparer that just compares the priority. The constructor takes a function that computes the priority for a given object.
As for actual implementation, they're not efficiently implemented, I just lock around everything. Creating more efficient implementations would prevent the use of SortedSet as a priority queue, and re-implementing one of those that can be effectively accessed concurrently is not going to be that easy.
In order to change the priority of an item you'll need to remove the item from the set and then add it again, and to find it without iterating the whole set you'd need to know the old priority as well as the new priority.
public class ConcurrentPriorityQueue<T> : IProducerConsumerCollection<T>
{
private object key = new object();
private SortedSet<Tuple<T, int>> set;
private Func<T, int> prioritySelector;
public ConcurrentPriorityQueue(Func<T, int> prioritySelector, IComparer<T> comparer = null)
{
this.prioritySelector = prioritySelector;
set = new SortedSet<Tuple<T, int>>(
new MyComparer<T>(comparer ?? Comparer<T>.Default));
}
private class MyComparer<T> : IComparer<Tuple<T, int>>
{
private IComparer<T> comparer;
public MyComparer(IComparer<T> comparer)
{
this.comparer = comparer;
}
public int Compare(Tuple<T, int> first, Tuple<T, int> second)
{
var returnValue = first.Item2.CompareTo(second.Item2);
if (returnValue == 0)
returnValue = comparer.Compare(first.Item1, second.Item1);
return returnValue;
}
}
public bool TryAdd(T item)
{
lock (key)
{
return set.Add(Tuple.Create(item, prioritySelector(item)));
}
}
public bool TryTake(out T item)
{
lock (key)
{
if (set.Count > 0)
{
var first = set.First();
item = first.Item1;
return set.Remove(first);
}
else
{
item = default(T);
return false;
}
}
}
public bool ChangePriority(T item, int oldPriority, int newPriority)
{
lock (key)
{
if (set.Remove(Tuple.Create(item, oldPriority)))
{
return set.Add(Tuple.Create(item, newPriority));
}
else
return false;
}
}
public bool ChangePriority(T item)
{
lock (key)
{
var result = set.FirstOrDefault(pair => object.Equals(pair.Item1, item));
if (object.Equals(result.Item1, item))
{
return ChangePriority(item, result.Item2, prioritySelector(item));
}
else
{
return false;
}
}
}
public void CopyTo(T[] array, int index)
{
lock (key)
{
foreach (var item in set.Select(pair => pair.Item1))
{
array[index++] = item;
}
}
}
public T[] ToArray()
{
lock (key)
{
return set.Select(pair => pair.Item1).ToArray();
}
}
public IEnumerator<T> GetEnumerator()
{
return ToArray().AsEnumerable().GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
public void CopyTo(Array array, int index)
{
lock (key)
{
foreach (var item in set.Select(pair => pair.Item1))
{
array.SetValue(item, index++);
}
}
}
public int Count
{
get { lock (key) { return set.Count; } }
}
public bool IsSynchronized
{
get { return true; }
}
public object SyncRoot
{
get { return key; }
}
}
Once you have an IProducerConsumerCollection<T> instance, which the above object is, you can use it as the internal backing object of a BlockingCollection<T> in order to have an easier to use user interface.
ParallelExtensionsExtras contains several custom TaskSchedulers that could be helpful either directly or as a base for your own scheduler.
Specifically, there are two schedulers that may be interesting for you:
QueuedTaskScheduler, which allows you to schedule Tasks at different priorities, but doesn't allow changing the priority of enqueued Tasks.
ReprioritizableTaskScheduler, which doesn't have different priorities, but allows you to move a specific Task to the front or to the back of the queue. (Though changing priority is O(n) in the number of currently waiting Tasks, which could be a problem if you had many Tasks at the same time.)

Is there a built-in way to convert IEnumerator to IEnumerable

Is there a built-in way to convert IEnumerator<T> to IEnumerable<T>?
The easiest way of converting I can think of is via the yield statement
public static IEnumerable<T> ToIEnumerable<T>(this IEnumerator<T> enumerator) {
while ( enumerator.MoveNext() ) {
yield return enumerator.Current;
}
}
compared to the list version this has the advantage of not enumerating the entire list before returning an IEnumerable. using the yield statement you'd only iterate over the items you need, whereas using the list version, you'd first iterate over all items in the list and then all the items you need.
for a little more fun you could change it to
public static IEnumerable<K> Select<K,T>(this IEnumerator<T> e,
Func<K,T> selector) {
while ( e.MoveNext() ) {
yield return selector(e.Current);
}
}
you'd then be able to use linq on your enumerator like:
IEnumerator<T> enumerator;
var someList = from item in enumerator
select new classThatTakesTInConstructor(item);
You could use the following which will kinda work.
public class FakeEnumerable<T> : IEnumerable<T> {
private IEnumerator<T> m_enumerator;
public FakeEnumerable(IEnumerator<T> e) {
m_enumerator = e;
}
public IEnumerator<T> GetEnumerator() {
return m_enumerator;
}
// Rest omitted
}
This will get you into trouble though when people expect successive calls to GetEnumerator to return different enumerators vs. the same one. But if it's a one time only use in a very constrained scenario, this could unblock you.
I do suggest though you try and not do this because I think eventually it will come back to haunt you.
A safer option is along the lines Jonathan suggested. You can expend the enumerator and create a List<T> of the remaining items.
public static List<T> SaveRest<T>(this IEnumerator<T> e) {
var list = new List<T>();
while ( e.MoveNext() ) {
list.Add(e.Current);
}
return list;
}
EnumeratorEnumerable<T>
A threadsafe, resettable adaptor from IEnumerator<T> to IEnumerable<T>
I use Enumerator parameters like in C++ forward_iterator concept.
I agree that this can lead to confusion as too many people will indeed assume Enumerators are /like/ Enumerables, but they are not.
However, the confusion is fed by the fact that IEnumerator contains the Reset method. Here is my idea of the most correct implementation. It leverages the implementation of IEnumerator.Reset()
A major difference between an Enumerable and and Enumerator is, that an Enumerable might be able to create several Enumerators simultaneously. This implementation puts a whole lot of work into making sure that this never happens for the EnumeratorEnumerable<T> type. There are two EnumeratorEnumerableModes:
Blocking (meaning that a second caller will simply wait till the first enumeration is completed)
NonBlocking (meaning that a second (concurrent) request for an enumerator simply throws an exception)
Note 1: 74 lines are implementation, 79 lines are testing code :)
Note 2: I didn't refer to any unit testing framework for SO convenience
using System;
using System.Diagnostics;
using System.Linq;
using System.Collections;
using System.Collections.Generic;
using System.Threading;
namespace EnumeratorTests
{
public enum EnumeratorEnumerableMode
{
NonBlocking,
Blocking,
}
public sealed class EnumeratorEnumerable<T> : IEnumerable<T>
{
#region LockingEnumWrapper
public sealed class LockingEnumWrapper : IEnumerator<T>
{
private static readonly HashSet<IEnumerator<T>> BusyTable = new HashSet<IEnumerator<T>>();
private readonly IEnumerator<T> _wrap;
internal LockingEnumWrapper(IEnumerator<T> wrap, EnumeratorEnumerableMode allowBlocking)
{
_wrap = wrap;
if (allowBlocking == EnumeratorEnumerableMode.Blocking)
Monitor.Enter(_wrap);
else if (!Monitor.TryEnter(_wrap))
throw new InvalidOperationException("Thread conflict accessing busy Enumerator") {Source = "LockingEnumWrapper"};
lock (BusyTable)
{
if (BusyTable.Contains(_wrap))
throw new LockRecursionException("Self lock (deadlock) conflict accessing busy Enumerator") { Source = "LockingEnumWrapper" };
BusyTable.Add(_wrap);
}
// always implicit Reset
_wrap.Reset();
}
#region Implementation of IDisposable and IEnumerator
public void Dispose()
{
lock (BusyTable)
BusyTable.Remove(_wrap);
Monitor.Exit(_wrap);
}
public bool MoveNext() { return _wrap.MoveNext(); }
public void Reset() { _wrap.Reset(); }
public T Current { get { return _wrap.Current; } }
object IEnumerator.Current { get { return Current; } }
#endregion
}
#endregion
private readonly IEnumerator<T> _enumerator;
private readonly EnumeratorEnumerableMode _allowBlocking;
public EnumeratorEnumerable(IEnumerator<T> e, EnumeratorEnumerableMode allowBlocking)
{
_enumerator = e;
_allowBlocking = allowBlocking;
}
private LockRecursionPolicy a;
public IEnumerator<T> GetEnumerator()
{
return new LockingEnumWrapper(_enumerator, _allowBlocking);
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
class TestClass
{
private static readonly string World = "hello world\n";
public static void Main(string[] args)
{
var master = World.GetEnumerator();
var nonblocking = new EnumeratorEnumerable<char>(master, EnumeratorEnumerableMode.NonBlocking);
var blocking = new EnumeratorEnumerable<char>(master, EnumeratorEnumerableMode.Blocking);
foreach (var c in nonblocking) Console.Write(c); // OK (implicit Reset())
foreach (var c in blocking) Console.Write(c); // OK (implicit Reset())
foreach (var c in nonblocking) Console.Write(c); // OK (implicit Reset())
foreach (var c in blocking) Console.Write(c); // OK (implicit Reset())
try
{
var willRaiseException = from c1 in nonblocking from c2 in nonblocking select new {c1, c2};
Console.WriteLine("Cartesian product: {0}", willRaiseException.Count()); // RAISE
}
catch (Exception e) { Console.WriteLine(e); }
foreach (var c in nonblocking) Console.Write(c); // OK (implicit Reset())
foreach (var c in blocking) Console.Write(c); // OK (implicit Reset())
try
{
var willSelfLock = from c1 in blocking from c2 in blocking select new { c1, c2 };
Console.WriteLine("Cartesian product: {0}", willSelfLock.Count()); // LOCK
}
catch (Exception e) { Console.WriteLine(e); }
// should not externally throw (exceptions on other threads reported to console)
if (ThreadConflictCombinations(blocking, nonblocking))
throw new InvalidOperationException("Should have thrown an exception on background thread");
if (ThreadConflictCombinations(nonblocking, nonblocking))
throw new InvalidOperationException("Should have thrown an exception on background thread");
if (ThreadConflictCombinations(nonblocking, blocking))
Console.WriteLine("Background thread timed out");
if (ThreadConflictCombinations(blocking, blocking))
Console.WriteLine("Background thread timed out");
Debug.Assert(true); // Must be reached
}
private static bool ThreadConflictCombinations(IEnumerable<char> main, IEnumerable<char> other)
{
try
{
using (main.GetEnumerator())
{
var bg = new Thread(o =>
{
try { other.GetEnumerator(); }
catch (Exception e) { Report(e); }
}) { Name = "background" };
bg.Start();
bool timedOut = !bg.Join(1000); // observe the thread waiting a full second for a lock (or throw the exception for nonblocking)
if (timedOut)
bg.Abort();
return timedOut;
}
} catch
{
throw new InvalidProgramException("Cannot be reached");
}
}
static private readonly object ConsoleSynch = new Object();
private static void Report(Exception e)
{
lock (ConsoleSynch)
Console.WriteLine("Thread:{0}\tException:{1}", Thread.CurrentThread.Name, e);
}
}
}
Note 3: I think the implementation of the thread locking (especially around BusyTable) is quite ugly; However, I didn't want to resort to ReaderWriterLock(LockRecursionPolicy.NoRecursion) and didn't want to assume .Net 4.0 for SpinLock
Solution with use of Factory along with fixing cached IEnumerator issue in JaredPar's answer allows to change the way of enumeration.
Consider a simple example: we want custom List<T> wrapper that allow to enumerate in reverse order along with default enumeration. List<T> already implements IEnumerator for default enumeration, we only need to create IEnumerator that enumerates in reverse order. (We won't use List<T>.AsEnumerable().Reverse() because it enumerates the list twice)
public enum EnumerationType {
Default = 0,
Reverse
}
public class CustomList<T> : IEnumerable<T> {
private readonly List<T> list;
public CustomList(IEnumerable<T> list) => this.list = new List<T>(list);
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
//Default IEnumerable method will return default enumerator factory
public IEnumerator<T> GetEnumerator()
=> GetEnumerable(EnumerationType.Default).GetEnumerator();
public IEnumerable<T> GetEnumerable(EnumerationType enumerationType)
=> enumerationType switch {
EnumerationType.Default => new DefaultEnumeratorFactory(list),
EnumerationType.Reverse => new ReverseEnumeratorFactory(list)
};
//Simple implementation of reverse list enumerator
private class ReverseEnumerator : IEnumerator<T> {
private readonly List<T> list;
private int index;
internal ReverseEnumerator(List<T> list) {
this.list = list;
index = list.Count-1;
Current = default;
}
public void Dispose() { }
public bool MoveNext() {
if(index >= 0) {
Current = list[index];
index--;
return true;
}
Current = default;
return false;
}
public T Current { get; private set; }
object IEnumerator.Current => Current;
void IEnumerator.Reset() {
index = list.Count - 1;
Current = default;
}
}
private abstract class EnumeratorFactory : IEnumerable<T> {
protected readonly List<T> List;
protected EnumeratorFactory(List<T> list) => List = list;
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
public abstract IEnumerator<T> GetEnumerator();
}
private class DefaultEnumeratorFactory : EnumeratorFactory {
public DefaultEnumeratorFactory(List<T> list) : base(list) { }
//Default enumerator is already implemented in List<T>
public override IEnumerator<T> GetEnumerator() => List.GetEnumerator();
}
private class ReverseEnumeratorFactory : EnumeratorFactory {
public ReverseEnumeratorFactory(List<T> list) : base(list) { }
public override IEnumerator<T> GetEnumerator() => new ReverseEnumerator(List);
}
}
As Jason Watts said -- no, not directly.
If you really want to, you could loop through the IEnumerator<T>, putting the items into a List<T>, and return that, but I'm guessing that's not what you're looking to do.
The basic reason you can't go that direction (IEnumerator<T> to a IEnumerable<T>) is that IEnumerable<T> represents a set that can be enumerated, but IEnumerator<T> is a specific enumeratation over a set of items -- you can't turn the specific instance back into the thing that created it.
static class Helper
{
public static List<T> SaveRest<T>(this IEnumerator<T> enumerator)
{
var list = new List<T>();
while (enumerator.MoveNext())
{
list.Add(enumerator.Current);
}
return list;
}
public static ArrayList SaveRest(this IEnumerator enumerator)
{
var list = new ArrayList();
while (enumerator.MoveNext())
{
list.Add(enumerator.Current);
}
return list;
}
}
Nope, IEnumerator<> and IEnumerable<> are different beasts entirely.
This is a variant I have written... The specific is a little different. I wanted to do a MoveNext() on an IEnumerable<T>, check the result, and then roll everything in a new IEnumerator<T> that was "complete" (so that included even the element of the IEnumerable<T> I had already extracted)
// Simple IEnumerable<T> that "uses" an IEnumerator<T> that has
// already received a MoveNext(). "eats" the first MoveNext()
// received, then continues normally. For shortness, both IEnumerable<T>
// and IEnumerator<T> are implemented by the same class. Note that if a
// second call to GetEnumerator() is done, the "real" IEnumerator<T> will
// be returned, not this proxy implementation.
public class EnumerableFromStartedEnumerator<T> : IEnumerable<T>, IEnumerator<T>
{
public readonly IEnumerator<T> Enumerator;
public readonly IEnumerable<T> Enumerable;
// Received by creator. Return value of MoveNext() done by caller
protected bool FirstMoveNextSuccessful { get; set; }
// The Enumerator can be "used" only once, then a new enumerator
// can be requested by Enumerable.GetEnumerator()
// (default = false)
protected bool Used { get; set; }
// The first MoveNext() has been already done (default = false)
protected bool DoneMoveNext { get; set; }
public EnumerableFromStartedEnumerator(IEnumerator<T> enumerator, bool firstMoveNextSuccessful, IEnumerable<T> enumerable)
{
Enumerator = enumerator;
FirstMoveNextSuccessful = firstMoveNextSuccessful;
Enumerable = enumerable;
}
public IEnumerator<T> GetEnumerator()
{
if (Used)
{
return Enumerable.GetEnumerator();
}
Used = true;
return this;
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
public T Current
{
get
{
// There are various school of though on what should
// happens if called before the first MoveNext() or
// after a MoveNext() returns false. We follow the
// "return default(TInner)" school of thought for the
// before first MoveNext() and the "whatever the
// Enumerator wants" for the after a MoveNext() returns
// false
if (!DoneMoveNext)
{
return default(T);
}
return Enumerator.Current;
}
}
public void Dispose()
{
Enumerator.Dispose();
}
object IEnumerator.Current
{
get
{
return Current;
}
}
public bool MoveNext()
{
if (!DoneMoveNext)
{
DoneMoveNext = true;
return FirstMoveNextSuccessful;
}
return Enumerator.MoveNext();
}
public void Reset()
{
// This will 99% throw :-) Not our problem.
Enumerator.Reset();
// So it is improbable we will arrive here
DoneMoveNext = true;
}
}
Use:
var enumerable = someCollection<T>;
var enumerator = enumerable.GetEnumerator();
bool res = enumerator.MoveNext();
// do whatever you want with res/enumerator.Current
var enumerable2 = new EnumerableFromStartedEnumerator<T>(enumerator, res, enumerable);
Now, the first GetEnumerator() that will be requested to enumerable2 will be given through the enumerator enumerator. From the second onward the enumerable.GetEnumerator() will be used.
The other answers here are ... strange. IEnumerable<T> has just one method, GetEnumerator(). And an IEnumerable<T> must implement IEnumerable, which also has just one method, GetEnumerator() (the difference being that one is generic on T and the other is not). So it should be clear how to turn an IEnumerator<T> into an IEnumerable<T>:
// using modern expression-body syntax
public class IEnumeratorToIEnumerable<T> : IEnumerable<T>
{
private readonly IEnumerator<T> Enumerator;
public IEnumeratorToIEnumerable(IEnumerator<T> enumerator) =>
Enumerator = enumerator;
public IEnumerator<T> GetEnumerator() => Enumerator;
IEnumerator IEnumerable.GetEnumerator() => Enumerator;
}
foreach (var foo in new IEnumeratorToIEnumerable<Foo>(fooEnumerator))
DoSomethingWith(foo);
// and you can also do:
var fooEnumerable = new IEnumeratorToIEnumerable<Foo>(fooEnumerator);
foreach (var foo in fooEnumerable)
DoSomethingWith(foo);
// Some IEnumerators automatically repeat after MoveNext() returns false,
// in which case this is a no-op, but generally it's required.
fooEnumerator.Reset();
foreach (var foo in fooEnumerable)
DoSomethingElseWith(foo);
However, none of this should be needed because it's unusual to have an IEnumerator<T> that doesn't come with an IEnumerable<T> that returns an instance of it from its GetEnumerator method. If you're writing your own IEnumerator<T>, you should certainly provide the IEnumerable<T>. And really it's the other way around ... an IEnumerator<T> is intended to be a private class that iterates over instances of a public class that implements IEnumerable<T>.

Categories

Resources