C#: How can I make an IEnumerable<T> thread safe? - c#

Say I have this simple method:
public IEnumerable<uint> GetNumbers()
{
uint n = 0;
while(n < 100)
yield return n++;
}
How would you make this thread safe? And by that I mean that you would get that enumerator once, and have multiple threads handle all the numbers without anyone getting duplicates.
I suppose a lock needs to be used somewhere, but where must that lock be for an iterator block to be thread safe? What, in general, do you need to remember if you want a thread safe IEnumerable<T>? Or rather I guess it would be a thread safe IEnumerator<T>...?

There's an inherent problem in doing so, because IEnumerator<T> has both MoveNext() and Current. You really want a single call such as:
bool TryMoveNext(out T value)
at that point you can atomically move to the next element and get a value. Implementing that and still being able to use yield could be tricky... I'll have a think about it though. I think you'd need to wrap the "non-threadsafe" iterator in a thread-safe one which atomically performed MoveNext() and Current to implement the interface shown above. I don't know how you'd then wrap this interface back into IEnumerator<T> so that you could use it in foreach though...
If you're using .NET 4.0, Parallel Extensions may be able to help you - you'd need to explain more about what you're trying to do though.
This is an interesting topic - I may have to blog about it...
EDIT: I've now blogged about it with two approaches.

I just tested this bit of code:
static IEnumerable<int> getNums()
{
Console.WriteLine("IENUM - ENTER");
for (int i = 0; i < 10; i++)
{
Console.WriteLine(i);
yield return i;
}
Console.WriteLine("IENUM - EXIT");
}
static IEnumerable<int> getNums2()
{
try
{
Console.WriteLine("IENUM - ENTER");
for (int i = 0; i < 10; i++)
{
Console.WriteLine(i);
yield return i;
}
}
finally
{
Console.WriteLine("IENUM - EXIT");
}
}
getNums2() always calls the finally part of the code. If you want your IEnumerable to be thread safe, add whatever thread locks you want instead of writelines, wither using ReaderWriterSlimLock, Semaphore, Monitor, etc.

Well, i'm not sure, but maybe with some locks in the caller ?
Draft:
Monitor.Enter(syncRoot);
foreach (var item in enumerable)
{
Monitor.Exit(syncRoot);
//Do something with item
Monitor.Enter(syncRoot);
}
Monitor.Exit(syncRoot);

I was thinking that you can't make the yield keyword thread-safe, unless you make it depend on an already thread-safe source of values:
public interface IThreadSafeEnumerator<T>
{
void Reset();
bool TryMoveNext(out T value);
}
public class ThreadSafeUIntEnumerator : IThreadSafeEnumerator<uint>, IEnumerable<uint>
{
readonly object sync = new object();
uint n;
#region IThreadSafeEnumerator<uint> Members
public void Reset()
{
lock (sync)
{
n = 0;
}
}
public bool TryMoveNext(out uint value)
{
bool success = false;
lock (sync)
{
if (n < 100)
{
value = n++;
success = true;
}
else
{
value = uint.MaxValue;
}
}
return success;
}
#endregion
#region IEnumerable<uint> Members
public IEnumerator<uint> GetEnumerator()
{
//Reset(); // depends on what behaviour you want
uint value;
while (TryMoveNext(out value))
{
yield return value;
}
}
#endregion
#region IEnumerable Members
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
//Reset(); // depends on what behaviour you want
uint value;
while (TryMoveNext(out value))
{
yield return value;
}
}
#endregion
}
You will have to decide whether each typical initiation of an enumerator should reset the sequence, or if the client code must do that.

You could just return a complete sequence each time rather than use yield:
return Enumerable.Range(0, 100).Cast<uint>().ToArray();

Related

Release a lock before waiting, and re-acquire it after

In Java, you can associate multiple Condition objects to a single ReentrantLock. What would the C# equivalent be?
Real-world example: The example implementation in the Java Condition documentation uses two Condition objects, notFull and notEmpty, tied to the same lock. How could that example be translated to C#?
Background: I often find Java code using two Condition objects to signal various states, associated to the same Lock; in C#, it seems that you can either
call Monitor.Enter on an object, and then Monitor.WaitOne/Monitor.Pulse, but that's just one condition.
use multiple Auto/ManualResetEvent objects, but these cannot atomically reacquire a given lock after waiting.
Note: I can think of one way: using Monitor.WaitOne/Monitor.PulseAll on a single object, and checking for the condition after waking up; that's what you do in Java as well to protect against spurious wake-ups. It doesn't really do, though, because it forces you to call PulseAll instead of Pulse, since Pulse might wake up a thread waiting on another condition. Unfortunately, using PulseAll instead of Pulse has performance implications (threads competing for the same lock).
I think if you are doing new development and can do .NET 4 or above, you'll be better served by the new concurrent collection classes, like ConcurrentQueue.
But if you can't make that move, and to strictly answer your question, in .NET this is somewhat simplified imho, to implement a prod/cons pattern you would just do wait and then pulse like below (note that I typed this on notepad)
// max is 1000 items in queue
private int _count = 1000;
private Queue<string> _myQueue = new Queue<string>();
private static object _door = new object();
public void AddItem(string someItem)
{
lock (_door)
{
while (_myQueue.Count == _count)
{
// reached max item, let's wait 'till there is room
Monitor.Wait(_door);
}
_myQueue.Enqueue(someItem);
// signal so if there are therads waiting for items to be inserted are waken up
// one at a time, so they don't try to dequeue items that are not there
Monitor.Pulse(_door);
}
}
public string RemoveItem()
{
string item = null;
lock (_door)
{
while (_myQueue.Count == 0)
{
// no items in queue, wait 'till there are items
Monitor.Wait(_door);
}
item = _myQueue.Dequeue();
// signal we've taken something out
// so if there are threads waiting, will be waken up one at a time so we don't overfill our queue
Monitor.Pulse(_door);
}
return item;
}
Update: To clear up any confusion, note that Monitor.Wait releases a lock, therefore you won't get a deadlock
#Jason If the queue is full and you wake only ONE thread, you are not guaranteed that thread is a consumer. It might be a producer and you get stuck.
I haven't come across much C# code that would want to share state within a lock. Without rolling your own you could use a SemaphoreSlim (but I recommend ConcurrentQueue(T) or BlockingCollection(T)).
public class BoundedBuffer<T>
{
private readonly SemaphoreSlim _locker = new SemaphoreSlim(1,1);
private readonly int _maxCount = 1000;
private readonly Queue<T> _items;
public int Count { get { return _items.Count; } }
public BoundedBuffer()
{
_items = new Queue<T>(_maxCount);
}
public BoundedBuffer(int maxCount)
{
_maxCount = maxCount;
_items = new Queue<T>(_maxCount);
}
public void Put(T item, CancellationToken token)
{
_locker.Wait(token);
try
{
while(_maxCount == _items.Count)
{
_locker.Release();
Thread.SpinWait(1000);
_locker.Wait(token);
}
_items.Enqueue(item);
}
catch(OperationCanceledException)
{
try
{
_locker.Release();
}
catch(SemaphoreFullException) { }
throw;
}
finally
{
if(!token.IsCancellationRequested)
{
_locker.Release();
}
}
}
public T Take(CancellationToken token)
{
_locker.Wait(token);
try
{
while(0 == _items.Count)
{
_locker.Release();
Thread.SpinWait(1000);
_locker.Wait(token);
}
return _items.Dequeue();
}
catch(OperationCanceledException)
{
try
{
_locker.Release();
}
catch(SemaphoreFullException) { }
throw;
}
finally
{
if(!token.IsCancellationRequested)
{
_locker.Release();
}
}
}
}

Proper locking in thread-safe self-generating list (C#)

I have a singleton IEnumerable that generates a sequence of numbers. The sequence is interable (basically indefinitely) and I generate the next number in the sequence only when needed.
public class Generator:IEnumerable<long> {
private Generator() { }
private static volatile Generator instance=new Generator();
private static readonly object syncRoot=new object();
public static Generator Instance { get { return instance; } }
private static List<long> numsList=new List<long>();
private void GenerateNextNumber() {
long number;
//Code to generate next number
numsList.Add(number);
}
private long GenerateToNthNumber(int n) {
lock(syncRoot) {
while(numsList.Count<n)
GenerateNextNumber();
}
return numsList[n-1];
}
public static long GetNthNumber(int n) {
return Instance.GenerateToNthNumber(n);
}
private class GeneratorEnumerator:IEnumerator<long> {
private int index=0;
public long Current { get { return GetNthNumber(index); } }
public void Dispose() { }
object System.Collections.IEnumerator.Current { get { return GetNthNumber(index); } }
public bool MoveNext() {
index++;
return true;
}
public void Reset() {
index=0;
}
}
public IEnumerator<long> GetEnumerator() {
return new GeneratorEnumerator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() {
return GetEnumerator();
}
}
This code works enumerating through and summing the numbers in concurrent threads. Is there a way to prevent having to lock every time GenerateToNthNumber is called? I tried this code:
private long GenerateToNthNumber(int n) {
if(numsList.Count<n) {
lock(syncRoot) {
while(numsList.Count<n)
GenerateNextNumber();
}
}
return numsList[n-1];
}
But when testing enumerating through and summing the numbers in multiple concurrent threads, not all the results end up with the same sum. My objective is to have non-blocking reads if the number being asked for is already generated, if that is even possible. Is there a better way to do this?
The way List is implemented, it cannot be safely read in one thread while it is being written in another. I would suggest that instead you use nested known-size arrays which, once allocated, are never abandoned (e.g. once an array is allocated that will hold theList[15691], the item will never be held by any other array). Such things may easily be used to implement an add-only list which requires locking when adding items, but is inherently thread-safe for reading without locking.
Have you thought about using a thread safe collection?
http://msdn.microsoft.com/en-us/library/dd997305.aspx

How to Create a Thread-Safe Generic List?

I have a Generic List as below
public static readonly List<Customer> Customers = new List<Customer>();
I'm using the below methods for it:
.Add
.Find
.FirstOrDefault
The last 2 are LINQ extensions.
I'd need to make this thread-safe to be able to run multiple instances of the container class.
How to achieve that?
If those are the only functions you are using on List<T> then the easiest way is to write a quick wrapper that synchronizes access with a lock
class MyList<T> {
private List<T> _list = new List<T>();
private object _sync = new object();
public void Add(T value) {
lock (_sync) {
_list.Add(value);
}
}
public bool Find(Predicate<T> predicate) {
lock (_sync) {
return _list.Find(predicate);
}
}
public T FirstOrDefault() {
lock (_sync) {
return _list.FirstOrDefault();
}
}
}
I highly recommend the approach of a new type + private lock object. It makes it much more obvious to the next guy who inherits your code what the actual intent was.
Also note that .Net 4.0 introduced a new set of collections specifically aimed at being used from multiple threads. If one of these meets your needs I'd highly recommend using it over rolling your own.
ConcurrentStack<T>
ConcurrentQueue<T>
To expand on #JaradPar's answer, here is a full implementation with a few extra features, as described in the summary
/// <summary>
/// a thread-safe list with support for:
/// 1) negative indexes (read from end). "myList[-1]" gets the last value
/// 2) modification while enumerating: enumerates a copy of the collection.
/// </summary>
/// <typeparam name="TValue"></typeparam>
public class ConcurrentList<TValue> : IList<TValue>
{
private object _lock = new object();
private List<TValue> _storage = new List<TValue>();
/// <summary>
/// support for negative indexes (read from end). "myList[-1]" gets the last value
/// </summary>
/// <param name="index"></param>
/// <returns></returns>
public TValue this[int index]
{
get
{
lock (_lock)
{
if (index < 0)
{
index = this.Count - index;
}
return _storage[index];
}
}
set
{
lock (_lock)
{
if (index < 0)
{
index = this.Count - index;
}
_storage[index] = value;
}
}
}
public void Sort()
{
lock (_lock)
{
_storage.Sort();
}
}
public int Count
{
get
{
lock (_lock) return _storage.Count;
}
}
bool ICollection<TValue>.IsReadOnly
{
get
{
return ((IList<TValue>)_storage).IsReadOnly;
}
}
public void Add(TValue item)
{
lock (_lock)
{
_storage.Add(item);
}
}
public void Clear()
{
lock (_lock)
{
_storage.Clear();
}
}
public bool Contains(TValue item)
{
lock (_lock)
{
return _storage.Contains(item);
}
}
public void CopyTo(TValue[] array, int arrayIndex)
{
lock (_lock)
{
_storage.CopyTo(array, arrayIndex);
}
}
public int IndexOf(TValue item)
{
lock (_lock)
{
return _storage.IndexOf(item);
}
}
public void Insert(int index, TValue item)
{
lock (_lock)
{
_storage.Insert(index, item);
}
}
public bool Remove(TValue item)
{
lock (_lock)
{
return _storage.Remove(item);
}
}
public void RemoveAt(int index)
{
lock (_lock)
{
_storage.RemoveAt(index);
}
}
public IEnumerator<TValue> GetEnumerator()
{
lock (_lock)
{
return _storage.ToArray().AsEnumerable().GetEnumerator();
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
If you're using version 4 or greater of the .NET framework you can use the thread-safe collections.
You can replace List<T> with ConcurrentBag<T>:
namespace Playground.Sandbox
{
using System.Collections.Concurrent;
using System.Threading.Tasks;
public static class Program
{
public static void Main()
{
var items = new[] { "Foo", "Bar", "Baz" };
var bag = new ConcurrentBag<string>();
Parallel.ForEach(items, bag.Add);
}
}
}
You will need to use locks in every place where the collection gets modified or iterated over.
Either that or use one of the new thread-safe data structures, like ConcurrentBag.
Use the lock keyword when you manipulate the collection, ie: your Add/Find:
lock(Customers) {
Customers.Add(new Customer());
}
Never use ConcurrangBag for ordered data. Use Array instead
Make your Action as accessible by one only by using lock on any private object
Refer to : Thread Safe Generic Queue Class
http://www.codeproject.com/Articles/38908/Thread-Safe-Generic-Queue-Class
Ok, so I had to completely rewrite my answer. After 2 days of testing I have to say that the JasonS's code has some defects, I guess because of Enumerators. While one thread uses foreach, and the other other changes the list, it throws exceptions.
So I found this answer, and it works for me fine the last 48 hours non-stop, I guess more than 100k threads were created in my application, and used that lists.
The only thing I changed - I've moved entering the locks outside the try-finally section. Read here about the possible exceptions. Also, if you will read MSDN, they have the same approach.
But, as were mentioned in link below, List can not be 100% thread safe, probably that is why there is no default ConcurentList implementation in c#.

IEnumerator moving back to record

I have requirement in which I have to back and fort with record. So I am using IEnumerator to that. But I can move forward by movenext but there no way to move back
Here's one way you could wrap an IEnumerator<T>, by capturing its contents in a List<T> as it moves along:
public interface ITwoWayEnumerator<T> : IEnumerator<T>
{
bool MovePrevious();
}
public class TwoWayEnumerator<T> : ITwoWayEnumerator<T>
{
private IEnumerator<T> _enumerator;
private List<T> _buffer;
private int _index;
public TwoWayEnumerator(IEnumerator<T> enumerator)
{
if (enumerator == null)
throw new ArgumentNullException("enumerator");
_enumerator = enumerator;
_buffer = new List<T>();
_index = -1;
}
public bool MovePrevious()
{
if (_index <= 0)
{
return false;
}
--_index;
return true;
}
public bool MoveNext()
{
if (_index < _buffer.Count - 1)
{
++_index;
return true;
}
if (_enumerator.MoveNext())
{
_buffer.Add(_enumerator.Current);
++_index;
return true;
}
return false;
}
public T Current
{
get
{
if (_index < 0 || _index >= _buffer.Count)
throw new InvalidOperationException();
return _buffer[_index];
}
}
public void Reset()
{
_enumerator.Reset();
_buffer.Clear();
_index = -1;
}
public void Dispose()
{
_enumerator.Dispose();
}
object System.Collections.IEnumerator.Current
{
get { return Current; }
}
}
Then I would expose this kind of enumerator using an extension method:
public static class TwoWayEnumeratorHelper
{
public static ITwoWayEnumerator<T> GetTwoWayEnumerator<T>(this IEnumerable<T> source)
{
if (source == null)
throw new ArgumentNullExceptions("source");
return new TwoWayEnumerator<T>(source.GetEnumerator());
}
}
Note that this is definitely overkill if the collection you're dealing with is already an indexed collection such as a T[] or a List<T>. It makes more sense for scenarios such as when you're enumerating over a sequence that isn't already in a conveniently indexed form and you want to be able to go backwards as well as forwards.
The IEnumerator (and IEnumerator<T>) interfaces only implement a forward only enumerator. You'll need to make your own class or interface if you want to allow bi-directional iteration through your collection.
You can't go backwards with IEnumerator. Either suck the entire set into a List or cache the current element on each pass through the loop, so it's available to the next pass.

Faster enumeration: Leveraging Array Enumeration

So, I have a class with an array inside. Currently, my strategy for enumerating over the class's items is to use the code, foreach (item x in classInstance.InsideArray) . I would much rather use foreach (item x in classInstance) and make the array private. My main concern is that I really need to avoid anything slow; the array gets hit a lot (and has a couple hundred items). It is vital that enumerating over this array is cheap. One thought was to just have the class implement IEnumerable<item>, but InsideArray.getEnumerator() only gives me a non-generic enumerator. I also tried implementing the IEnumerable interface. This worked but was very slow, possibly due to boxing.
Is there a way to make the class itself enumerable without a performance hit?
Normal Code:
//Class
public class Foo {
//Stuff
public Item[,] InsideArray {get; private set;}
}
//Iteration. Shows up all over the place
foreach (Item x in classInstance.InsideArray)
{
//doStuff
}
Adjusted, much slower code:
//Class
public class Foo : IEnumerable {
//Stuff
private Item[,] InsideArray;
System.Collections.IEnumerator System.Collections.IEnumerable GetEnumerator()
{
return InsideArray.GetEnumerator();
}
}
//Iteration. Shows up all over the place
foreach (Item x in classInstance)
{
//doStuff
}
Note: Adding an implementation for the nongeneric iterator is possible and faster than my slow solution, but it is still a bit worse than just using the array directly. I was hoping there was a way to somehow tell C#, "hey, when I ask you to iterate over this object iterate over it's array, just as fast," but apparently that is not quite possible...at least from the answers suggested thus far.
A bespoke iterator might make it quicker (edited to return as known type):
Basic: 2468ms - -2049509440
Bespoke: 1087ms - -2049509440
(you would use the ArrayIterator directly as Foo's GetEnumerator - essentially copying the code from ArrayEnumerator.GetEnumerator; my point is to show that a typed iterator is faster than the interface)
With code:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
class Foo
{
public struct ArrayIterator<T> : IEnumerator<T>
{
private int x, y;
private readonly int width, height;
private T[,] data;
public ArrayIterator(T[,] data)
{
this.data = data;
this.width = data.GetLength(0);
this.height = data.GetLength(1);
x = y = 0;
}
public void Dispose() { data = null; }
public bool MoveNext()
{
if (++x >= width)
{
x = 0;
y++;
}
return y < height;
}
public void Reset() { x = y = 0; }
public T Current { get { return data[x, y]; } }
object IEnumerator.Current { get { return data[x, y]; } }
}
public sealed class ArrayEnumerator<T> : IEnumerable<T>
{
private readonly T[,] arr;
public ArrayEnumerator(T[,] arr) { this.arr = arr; }
public ArrayIterator<T> GetEnumerator()
{
return new ArrayIterator<T>(arr);
}
System.Collections.Generic.IEnumerator<T> System.Collections.Generic.IEnumerable<T>.GetEnumerator()
{
return GetEnumerator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
public int[,] data;
public IEnumerable<int> Basic()
{
foreach (int i in data) yield return i;
}
public ArrayEnumerator<int> Bespoke()
{
return new ArrayEnumerator<int>(data);
}
public Foo()
{
data = new int[500, 500];
for (int x = 0; x < 500; x++)
for (int y = 0; y < 500; y++)
{
data[x, y] = x + y;
}
}
static void Main()
{
Test(1); // for JIT
Test(500); // for real
Console.ReadKey(); // pause
}
static void Test(int count)
{
Foo foo = new Foo();
int chk;
Stopwatch watch = Stopwatch.StartNew();
chk = 0;
for (int i = 0; i < count; i++)
{
foreach (int j in foo.Basic())
{
chk += j;
}
}
watch.Stop();
Console.WriteLine("Basic: " + watch.ElapsedMilliseconds + "ms - " + chk);
watch = Stopwatch.StartNew();
chk = 0;
for (int i = 0; i < count; i++)
{
foreach (int j in foo.Bespoke())
{
chk += j;
}
}
watch.Stop();
Console.WriteLine("Bespoke: " + watch.ElapsedMilliseconds + "ms - " + chk);
}
}
Cast your array to IEnumerable<item> before calling GetEnumerator() and you'll get the generic IEnumerator. For example:
string[] names = { "Jon", "Marc" };
IEnumerator<string> enumerable = ((IEnumerable<string>)names).GetEnumerator();
It may well still be a bit slower than enumerating the array directly with foreach (which the C# compiler does in a different way) but at least you won't have anything else in the way.
EDIT:
Okay, you said your other attempt used an indexer. You could try this approach, although I don't think it'll be any faster:
public IEnumerable<Item> Items
{
get
{
foreach (Item x in items)
{
yield return x;
}
}
}
An alternative would be to try to avoid using a two-dimensional array to start with. Is that an absolute requirement? How often are you iterating over a single array after creating it? It may be worth taking a slight hit at creation time to make iteration cheaper.
EDIT: Another suggestion, which is slightly off the wall... instead of passing the iterator back to the caller, why not get the caller to say what to do with each item, using a delegate?
public void ForEachItem(Action action)
{
foreach (Item item in items)
{
action(item);
}
}
Downsides:
You incur the penalty of a delegate call on each access.
It's hard to break out of the loop (other than by throwing an exception). There are different ways of approaching this, but let's cross that bridge when we come to it.
Developers who aren't familiar with delegates may get a bit confused.
How about adding an indexer to the class:
public MyInsideArrayType this[int index]
{
get{return this.insideArray[index];
}
And if you REALLY need foreach capabilities:
public IEnumerable<MyInsideArrayType> GetEnumerator()
{
for(int i = 0; i<this.insideArray.Count;i++)
{
yield return this[i];
}
}
All forms of iteration are cheap. If anyone in this day-and-age managed to somehow write and publish an expensive iterator they would be (rightly) burned at the stake.
Premature optimization is evil.
Cheers. Keith.

Categories

Resources