Concurrent collection enumerator - c#

I'm currently programming my own implementation of priority queue / sorted list and I would like to have it concurrent.
In order to have it thread safe I'm using lock(someObject) and I would like to verify some behavior of mutexes in C#.
Inner representation of my sorted list is basically linked list with head and slots linked together.
Something like:
internal class Slot
{
internal T Value;
internal Slot Next;
public Slot(T value, Slot next = null)
{
Value = value;
Next = next;
}
}
Every time I'm manipulating with head I have to use lock(someObject)because of thread safety.
In order to implement ICollection interface I have to implement public IEnumerator<T> GetEnumerator(). In this method I have take my head and read from it so I should use mutex.
public IEnumerator<T> GetEnumerator()
{
lock (syncLock)
{
var curr = head;
while (curr != null)
{
yield return curr.Value;
curr = curr.Next;
}
}
}
My question is: Is syncLock locked for whole time in enumerator (so it will be unlocked after reaching end of the method) or it is automatically unlocked after yielding value?

Thank you guys from the comments, here's sum up.
Answer: yes, syncLock will be locked for the whole time → hence, it's a really bad idea
Possible solution:
make collection not thread safe
obtain lock, copy whole collection and return enumerator of this collection #Evk
use some kind of boolean flag, set it on true while enumerating over the collection and throw exception when Add, Clear or Remove methods are called -> this is default List behavior #ManfredRadlwimmer
make that collection immutable #InBetween

Related

Expensive IEnumerable: Any way to prevent multiple enumerations without forcing an immediate enumeration? [duplicate]

This question already has answers here:
Is there an IEnumerable implementation that only iterates over it's source (e.g. LINQ) once?
(4 answers)
Closed 9 months ago.
I have a very large enumeration and am preparing an expensive deferred operation on it (e.g. sorting it). I'm then passing this into a function which may or may not consume the IEnumerable, depending on some logic of its own.
Here's an illustration:
IEnumerable<Order> expensiveEnumerable = fullCatalog.OrderBy(c => Prioritize(c));
MaybeFullFillSomeOrders(expensiveEnumerable);
// Elsewhere... (example use-case for multiple enumerations, not real code)
void MaybeFullFillSomeOrders(IEnumerable<Order> nextUpOrders){
if(notAGoodTime())
return;
foreach(var order in nextUpOrders)
collectSomeInfo(order);
processInfo();
foreach(var order in nextUpOrders) {
maybeFulfill(order);
if(atCapacity())
break;
}
}
I'm would like to prepare my input to the other function such that:
If they do not consume the enumerable, the performance price of sorting is not paid.
This already precludes calling e.g. ToList() or ToArray() on it
If they choose to enumerate multiple times (perhaps not realizing how expensive it would be in this case) I want some defence in place to prevent the multiple enumeration.
Ideally, the result is still an IEnumerable<T>
The best solution I've come up with is to use Lazy<>
var expensive = new Lazy<List<Order>>>(
() => fullCatalog.OrderBy(c => Prioritize(c)).ToList());
This appears to satisfy criteria 1 and 2, but has a couple of drawbacks:
I have to change the interface to all downstream usages to expect a Lazy.
The full list (which in this case was built up from a SelectMany() on serveral smaller partitions) would need to be allocated as a new single contiguous list in memory. I'm not sure there's an easy way around this if I want to "cache" the sort result, but if you know of one I'm all ears.
One idea I had to solve the first problem was to wrap Lazy<> in some custom class that either implements or can implicitly be converted to an IEnumerable<T>, but I'm hoping someone knows of a more elegant approach.
You certainly could write your own IEnumerable<T> implementation that wraps another one, remembering all the elements it's already seen (and whether it's exhausted or not). If you need it to be thread-safe that becomes trickier, and you'd need to remember that at any time there may be multiple iterators working against the same IEnumerable<T>.
Fundamentally I think it would come down to working out what to do when asked for the next element (which is somewhat-annoyingly split into MoveNext() and Current, but that can probably be handled...):
If you've already read the next element within another iterator, you can yield it from your buffer
If you've already discovered that there is no next element, you can return that immediately
Otherwise, you need to ask the original iterator for the next element, and remember if for all the other wrapped iterators.
The other aspect that's tricky is knowing when to dispose of the underlying IEnumerator<T> - if you don't need to do that, it makes things simpler.
As a very sketchy attempt that I haven't even attempted to compile, and which is definitely not thread-safe, you could try something like this:
public class LazyEnumerable<T> : IEnumerable<T>
{
private readonly IEnumerator<T> iterator;
private List<T> buffer;
private bool completed = false;
public LazyEnumerable(IEnumerable<T> original)
{
// TODO: You could be even lazier, only calling
// GetEnumerator when you first need an element
iterator = original.GetEnumerator();
}
IEnumerator GetEnumerator() => GetEnumerator();
public IEnumerator<T> GetEnumerator()
{
int index = 0;
while (true)
{
// If we already have the element, yield it
if (index < buffer.Count)
{
yield return buffer[index];
}
// If we've yielded everything in the buffer and some
// other iterator has come to the end of the original,
// we're done.
else if (completed)
{
yield break;
}
// Otherwise, see if there's anything left in the original
// iterator.
else
{
bool hasNext = iterator.MoveNext();
if (hasNext)
{
var current = iterator.Current;
buffer.Add(current);
yield return current;
}
else
{
completed = true;
yield break;
}
}
index++;
}
}
}

Explicit IEnumerator<T> implementation VS yield return implementation

I have following problem:
I want to implement my own collection, which will also implement ICollection<T> interface. That means that I also need to implement IEnumerable<T> interface. Implementing IEnumerable<T> can be done two ways:
First approach: by having private struct implementing IEnumerator<T> and returning it from GetEnumerator() method
Second approach: I can just use iterator (using yield return) and let compiler generate IEnumerator class for me.
I also want my enumerators to throw InvalidOperationException on any MoveNext() (or Reset()) call after the collection was modified
as it is specified in MS documentation
If I will use first approach, then everything is ok (I just save version of my collection when creating new enumerator and then in MoveNext() I just check if it hasnt changed).
The same trick is used by MS Collections But here is THE PROBLEM - if I will use second approach I think I cannot enforce following behaviour. Consider following code.
class MyCollection : ICollcetion<T>
{
int _version = 0;
T[] _collectionData;
public void Modify()
{
... // do the modification
_version++; // change version, so enumerators can check that they are invalid
}
...
public IEnumerator<T> GetEnumerator()
{
int _enmVersion = _version;
int i = 0;
yield return _collectionData[i];
if( _enmVersion != _version ) // check if there was not modificaction
{
throw new InvalidOperationException();
}
}
...
}
...
var collection = new MyCollection();
var e = collection.GetEnumerator();
myCollection.Modfiy(); //collection modification, so e becomes irrecoverably invalidated
e.MoveNext(); // right now I want to throw exception
//but instead of that, code until first yield return executed
//and the version is saved... :(
This problem occurs only if the collection is modified after obataining enumerator object, but before first call to MoveNext(), but still it is problem...
My question is: is it possible somehow use second approach AND enforce right behaviour OR I need for this case to stick with first approach ?
My suspicion is that I have to use first approach, since the code of iterator method is exeuted only when I call MoveNext() , and not in constructor of compiler generated class, but I want to be sure...
Your problem is that an iterator method won't run any code at all until the first MoveNext() call.
You can work around this problem by wrapping the iterator in a non-iterator method that grabs the version immediately and passes it as a parameter:
public IEnumerator<T> GetEnumerator() {
return GetRealEnumerator(_version);
}
private IEnumerator<T> GetRealEnumerator(int baseVersion) {

Getting head and tail from IEnumerable that can only be iterated once

I have a sequence of elements. The sequence can only be iterated once and can be "infinite".
What is the best way get the head and the tail of such a sequence?
Update: A few clarifications that would have been nice if I included in the original question :)
Head is the first element of the sequence and tail is "the rest". That means the the tail is also "infinite".
When I say infinite, I mean "very large" and "I wouldn't want to store it all in memory at once". It could also have been actually infinite, like sensor data for example (but it wasn't in my case).
When I say that it can only be iterated once, I mean that generating the sequence is resource heavy, so I woundn't want to do it again. It could also have been volatile data, again like sensor data, that won't be the same on next read (but it wasn't in my case).
Decomposing IEnumerable<T> into head & tail isn't particularly good for recursive processing (unlike functional lists) because when you use the tail operation recursively, you'll create a number of indirections. However, you can write something like this:
I'm ignoring things like argument checking and exception handling, but it shows the idea...
Tuple<T, IEnumerable<T>> HeadAndTail<T>(IEnumerable<T> source) {
// Get first element of the 'source' (assuming it is there)
var en = source.GetEnumerator();
en.MoveNext();
// Return first element and Enumerable that iterates over the rest
return Tuple.Create(en.Current, EnumerateTail(en));
}
// Turn remaining (unconsumed) elements of enumerator into enumerable
IEnumerable<T> EnumerateTail<T>(IEnumerator en) {
while(en.MoveNext()) yield return en.Current;
}
The HeadAndTail method gets the first element and returns it as the first element of a tuple. The second element of a tuple is IEnumerable<T> that's generated from the remaining elements (by iterating over the rest of the enumerator that we already created).
Obviously, each call to HeadAndTail should enumerate the sequence again (unless there is some sort of caching used). For example, consider the following:
var a = HeadAndTail(sequence);
Console.WriteLine(HeadAndTail(a.Tail).Tail);
//Element #2; enumerator is at least at #2 now.
var b = HeadAndTail(sequence);
Console.WriteLine(b.Tail);
//Element #1; there is no way to get #1 unless we enumerate the sequence again.
For the same reason, HeadAndTail could not be implemented as separate Head and Tail methods (unless you want even the first call to Tail to enumerate the sequence again even if it was already enumerated by a call to Head).
Additionally, HeadAndTail should not return an instance of IEnumerable (as it could be enumerated multiple times).
This leaves us with the only option: HeadAndTail should return IEnumerator, and, to make things more obvious, it should accept IEnumerator as well (we're just moving an invocation of GetEnumerator from inside the HeadAndTail to the outside, to emphasize it is of one-time use only).
Now that we have worked out the requirements, the implementation is pretty straightforward:
class HeadAndTail<T> {
public readonly T Head;
public readonly IEnumerator<T> Tail;
public HeadAndTail(T head, IEnumerator<T> tail) {
Head = head;
Tail = tail;
}
}
static class IEnumeratorExtensions {
public static HeadAndTail<T> HeadAndTail<T>(this IEnumerator<T> enumerator) {
if (!enumerator.MoveNext()) return null;
return new HeadAndTail<T>(enumerator.Current, enumerator);
}
}
And now it can be used like this:
Console.WriteLine(sequence.GetEnumerator().HeadAndTail().Tail.HeadAndTail().Head);
//Element #2
Or in recursive functions like this:
TResult FoldR<TSource, TResult>(
IEnumerator<TSource> sequence,
TResult seed,
Func<TSource, TResult, TResult> f
) {
var headAndTail = sequence.HeadAndTail();
if (headAndTail == null) return seed;
return f(headAndTail.Head, FoldR(headAndTail.Tail, seed, f));
}
int Sum(IEnumerator<int> sequence) {
return FoldR(sequence, 0, (x, y) => x+y);
}
var array = Enumerable.Range(1, 5);
Console.WriteLine(Sum(array.GetEnumerator())); //1+(2+(3+(4+(5+0)))))
While other approaches here suggest using yield return for the tail enumerable, such an approach adds unnecessary nesting overhead. A better approach would be to convert the Enumerator<T> back into something that can be used with foreach:
public struct WrappedEnumerator<T>
{
T myEnumerator;
public T GetEnumerator() { return myEnumerator; }
public WrappedEnumerator(T theEnumerator) { myEnumerator = theEnumerator; }
}
public static class AsForEachHelper
{
static public WrappedEnumerator<IEnumerator<T>> AsForEach<T>(this IEnumerator<T> theEnumerator) {return new WrappedEnumerator<IEnumerator<T>>(theEnumerator);}
static public WrappedEnumerator<System.Collections.IEnumerator> AsForEach(this System.Collections.IEnumerator theEnumerator)
{ return new WrappedEnumerator<System.Collections.IEnumerator>(theEnumerator); }
}
If one used separate WrappedEnumerator structs for the generic IEnumerable<T> and non-generic IEnumerable, one could have them implement IEnumerable<T> and IEnumerable respectively; they wouldn't really obey the IEnumerable<T> contract, though, which specifies that it should be possible to possible to call GetEnumerator() multiple times, with each call returning an independent enumerator.
Another important caveat is that if one uses AsForEach on an IEnumerator<T>, the resulting WrappedEnumerator should be enumerated exactly once. If it is never enumerated, the underlying IEnumerator<T> will never have its Dispose method called.
Applying the above-supplied methods to the problem at hand, it would be easy to call GetEnumerator() on an IEnumerable<T>, read out the first few items, and then use AsForEach() to convert the remainder so it can be used with a ForEach loop (or perhaps, as noted above, to convert it into an implementation of IEnumerable<T>). It's important to note, however, that calling GetEnumerator() creates an obligation to Dispose the resulting IEnumerator<T>, and the class that performs the head/tail split would have no way to do that if nothing ever calls GetEnumerator() on the tail.
probably not the best way to do it but if you use the .ToList() method you can then get the elements in position [0] and [Count-1], if Count > 0.
But you should specify what do you mean by "can be iterated only once"
What exactly is wrong with .First() and .Last()? Though yeah, I have to agree with the people who asked "what does the tail of an infinite list mean"... the notion doesn't make sense, IMO.

Help me understand the code snippet in c#

I am reading this blog: Pipes and filters pattern
I am confused by this code snippet:
public class Pipeline<T>
{
private readonly List<IOperation<T>> operations = new List<IOperation<T>>();
public Pipeline<T> Register(IOperation<T> operation)
{
operations.Add(operation);
return this;
}
public void Execute()
{
IEnumerable<T> current = new List<T>();
foreach (IOperation<T> operation in operations)
{
current = operation.Execute(current);
}
IEnumerator<T> enumerator = current.GetEnumerator();
while (enumerator.MoveNext());
}
}
what is the purpose of this statement: while (enumerator.MoveNext());? seems this code is a noop.
First consider this:
IEnumerable<T> current = new List<T>();
foreach (IOperation<T> operation in operations)
{
current = operation.Execute(current);
}
This code appears to be creating nested enumerables, each of which takes elements from the previous, applies some operation to them, and passes the result to the next. But it only constructs the enumerables. Nothing actually happens yet. It's just ready to go, stored in the variable current. There are lots of ways to implement IOperation.Execute but it could be something like this.
IEnumerable<T> Execute(IEnumerable<T> ts)
{
foreach (T t in ts)
yield return this.operation(t); // Perform some operation on t.
}
Another option suggested in the article is a sort:
IEnumerable<T> Execute(IEnumerable<T> ts)
{
// Thank-you LINQ!
// This was 10 lines of non-LINQ code in the original article.
return ts.OrderBy(t => t.Foo);
}
Now look at this:
IEnumerator<T> enumerator = current.GetEnumerator();
while (enumerator.MoveNext());
This actually causes the chain of operations to be performed. When the elements are requested from the enumeration, it causes elements from the original enumerable to be passed through the chain of IOperations, each of which performs some operation on them. The end result is discarded so only the side-effect of the operation is interesting - such as writing to the console or logging to a file. This would have been a simpler way to write the last two lines:
foreach (T t in current) {}
Another thing to observe is that the initial list that starts the process is an empty list so for this to make sense some instances of T have to be created inside the first operation. In the article this is done by asking the user for input from the console.
In this case, the while (enumerator.MoveNext()); is simply evaluating all the items that are returned by the final IOperation<T>. It looks a little confusing, but the empty List<T> is only created in order to supply a value to the first IOperation<T>.
In many collections this would do exaclty nothing as you suggest, but given that we are talking about the pipes and filters pattern it is likely that the final value is some sort of iterator that will cause code to be executed. It could be something like this, for example (assuming that is an integer):
public class WriteToConsoleOperation : IOperation<int>
{
public IEnumerable<int> Execute(IEnumerable<int> ints)
{
foreach (var i in ints)
{
Console.WriteLine(i);
yield return i;
}
}
}
So calling MoveNext() for each item on the IEnumerator<int> returned by this iterator will return each of the values (which are ignored in the while loop) but also output each of the values to the console.
Does that make sense?
while (enumerator.MoveNext());
Inside the current block of code, there is no affect (it moves through all the items in the enumeration). The displayed code doesn't act on the current element in the enumeration. What might be happening is that the MoveNext() method is moving to the next element, and it is doing something to the objects in the collection (updating an internal value, pull the next from the database etc.). Since the type is List<T> this is probably not the case, but in other instances it could be.

Performance regarding cached thread-safe IEnumerable<T> implementation

I created the ThreadSafeCachedEnumerable<T> class intending to increase performance where long running queries where being reused. The idea was to get an enumerator from an IEnumerable<T> and add items to a cache on each call to MoveNext(). The following is my current implementation:
/// <summary>
/// Wraps an IEnumerable<T> and provides a thread-safe means of caching the values."/>
/// </summary>
/// <typeparam name="T"></typeparam>
class ThreadSafeCachedEnumerable<T> : IEnumerable<T>
{
// An enumerator from the original IEnumerable<T>
private IEnumerator<T> enumerator;
// The items we have already cached (from this.enumerator)
private IList<T> cachedItems = new List<T>();
public ThreadSafeCachedEnumerable(IEnumerable<T> enumerable)
{
this.enumerator = enumerable.GetEnumerator();
}
public IEnumerator<T> GetEnumerator()
{
// The index into the sequence
int currentIndex = 0;
// We will break with yield break
while (true)
{
// The currentIndex will never be decremented,
// so we can check without locking first
if (currentIndex < this.cachedItems.Count)
{
var current = this.cachedItems[currentIndex];
currentIndex += 1;
yield return current;
}
else
{
// If !(currentIndex < this.cachedItems.Count),
// we need to synchronize access to this.enumerator
lock (enumerator)
{
// See if we have more cached items ...
if (currentIndex < this.cachedItems.Count)
{
var current = this.cachedItems[currentIndex];
currentIndex += 1;
yield return current;
}
else
{
// ... otherwise, we'll need to get the next item from this.enumerator.MoveNext()
if (this.enumerator.MoveNext())
{
// capture the current item and cache it, then increment the currentIndex
var current = this.enumerator.Current;
this.cachedItems.Add(current);
currentIndex += 1;
yield return current;
}
else
{
// We reached the end of the enumerator - we're done
yield break;
}
}
}
}
}
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
I simply lock (this.enumerator) when the no more items appear to be in the cache, just in case another thread is just about to add another item (I assume that calling MoveNext() on this.enumerator from two threads is a bad idea).
The performance is great when retrieving previously cached items, but it starts to suffer when getting many items for the first time (due to the constant locking). Any suggestions for increasing the performance?
Edit: The new Reactive Framework solves the problem outlined above, using the System.Linq.EnumerableEx.MemoizeAll() extension method.
Internally, MemoizeAll() uses a System.Linq.EnumerableEx.MemoizeAllEnumerable<T> (found in the System.Interactive assembly), which is similar to my ThreadSafeCachedEnumerable<T> (sorta).
Here's an awfully contrived example that prints the contents of an Enumerable (numbers 1-10) very slowly, then quickly prints the contents a second time (because it cached the values):
// Create an Enumerable<int> containing numbers 1-10, using Thread.Sleep() to simulate work
var slowEnum = EnumerableEx.Generate(1, currentNum => (currentNum <= 10), currentNum => currentNum, previousNum => { Thread.Sleep(250); return previousNum + 1; });
// This decorates the slow enumerable with one that will cache each value.
var cachedEnum = slowEnum.MemoizeAll();
// Print the numbers
foreach (var num in cachedEnum.Repeat(2))
{
Console.WriteLine(num);
}
A couple of recommendations:
It is now generally accepted practice not to make container classes responsible for locking. Someone calling your cached enumerator, for instance, might also want to prevent new entries from being added to the container while enumerating, which means that locking would occur twice. Therefore, it's best to defer that responsibility to the caller.
Your caching depends on the enumerator always returning items in-order, which is not guaranteed. It's better to use a Dictionary or HashSet. Similarly, items may be removed inbetween calls, invalidating the cache.
It is generally not recommended to establish locks on publically accessible objects. That includes the wrapped enumerator. Exceptions are conceivable, for example when you're absolutely certain you're absolutely certain you're the only instance holding a reference to the container class you're enumerating over. This would also largely invalidate my objections under #2.
Locking in .NET is normally very quick (if there is no contention). Has profiling identified locking as the source of the performance problem? How long does it take to call MoveNext on the underlying enumerator?
Additionally, the code as it stands is not thread-safe. You cannot safely call this.cachedItems[currentIndex] on one thread (in if (currentIndex < this.cachedItems.Count)) while invoking this.cachedItems.Add(current) on another. From the List(T) documentation: "A List(T) can support multiple readers concurrently, as long as the collection is not modified." To be thread-safe, you would need to protect all access to this.cachedItems with a lock (if there's any chance that one or more threads could modify it).

Categories

Resources