C# List<T> indexer thread safety - c#

Until recently, I had been under the assumption that setting an element of a List<T> via indexer is thread safe in the following context.
// Assumes destination.Count >= source.Count
static void Function<T,U>(List<T> source, Func<T,U> converter, List<U> destination)
{
Parallel.ForEach(Partitioner.Create(0, source.Count), range =>
{
for(int i = range.Item1; i < range.Item2; i++)
{
destination[i] = converter(source[i]);
}
});
}
Since List<T> stores its elements in an array internally and setting one by index shouldn't necessitate resizing, this seemed like a reasonable leap of faith. Looking at the implementation of List<T> in .NET Core however, it appears that the indexer's setter modifies some internal state (see below).
// Sets or Gets the element at the given index.
public T this[int index]
{
get
{
// Following trick can reduce the range check by one
if ((uint)index >= (uint)_size)
{
ThrowHelper.ThrowArgumentOutOfRange_IndexException();
}
Contract.EndContractBlock();
return _items[index];
}
set
{
if ((uint)index >= (uint)_size)
{
ThrowHelper.ThrowArgumentOutOfRange_IndexException();
}
Contract.EndContractBlock();
_items[index] = value;
_version++;
}
}
So should I assume that List<T> is not thread-safe even when each thread is only getting/setting elements from its own portion of the collection?

Have a read here:
https://msdn.microsoft.com/en-us/library/6sh2ey19.aspx#Anchor_10
To answer your question, no - as per the documentation, it's not guaranteed to be thread safe.
Even if the current implementation appeared to be thread safe (which it doesn't, anyway), it would still be a bad idea to make that assumption. Since the documentation explicitly says it's not thread safe - future versions may legally change the underlying implementation to no longer be thread safe and break any assumption you previously relied on.

Related

Expensive IEnumerable: Any way to prevent multiple enumerations without forcing an immediate enumeration? [duplicate]

This question already has answers here:
Is there an IEnumerable implementation that only iterates over it's source (e.g. LINQ) once?
(4 answers)
Closed 9 months ago.
I have a very large enumeration and am preparing an expensive deferred operation on it (e.g. sorting it). I'm then passing this into a function which may or may not consume the IEnumerable, depending on some logic of its own.
Here's an illustration:
IEnumerable<Order> expensiveEnumerable = fullCatalog.OrderBy(c => Prioritize(c));
MaybeFullFillSomeOrders(expensiveEnumerable);
// Elsewhere... (example use-case for multiple enumerations, not real code)
void MaybeFullFillSomeOrders(IEnumerable<Order> nextUpOrders){
if(notAGoodTime())
return;
foreach(var order in nextUpOrders)
collectSomeInfo(order);
processInfo();
foreach(var order in nextUpOrders) {
maybeFulfill(order);
if(atCapacity())
break;
}
}
I'm would like to prepare my input to the other function such that:
If they do not consume the enumerable, the performance price of sorting is not paid.
This already precludes calling e.g. ToList() or ToArray() on it
If they choose to enumerate multiple times (perhaps not realizing how expensive it would be in this case) I want some defence in place to prevent the multiple enumeration.
Ideally, the result is still an IEnumerable<T>
The best solution I've come up with is to use Lazy<>
var expensive = new Lazy<List<Order>>>(
() => fullCatalog.OrderBy(c => Prioritize(c)).ToList());
This appears to satisfy criteria 1 and 2, but has a couple of drawbacks:
I have to change the interface to all downstream usages to expect a Lazy.
The full list (which in this case was built up from a SelectMany() on serveral smaller partitions) would need to be allocated as a new single contiguous list in memory. I'm not sure there's an easy way around this if I want to "cache" the sort result, but if you know of one I'm all ears.
One idea I had to solve the first problem was to wrap Lazy<> in some custom class that either implements or can implicitly be converted to an IEnumerable<T>, but I'm hoping someone knows of a more elegant approach.
You certainly could write your own IEnumerable<T> implementation that wraps another one, remembering all the elements it's already seen (and whether it's exhausted or not). If you need it to be thread-safe that becomes trickier, and you'd need to remember that at any time there may be multiple iterators working against the same IEnumerable<T>.
Fundamentally I think it would come down to working out what to do when asked for the next element (which is somewhat-annoyingly split into MoveNext() and Current, but that can probably be handled...):
If you've already read the next element within another iterator, you can yield it from your buffer
If you've already discovered that there is no next element, you can return that immediately
Otherwise, you need to ask the original iterator for the next element, and remember if for all the other wrapped iterators.
The other aspect that's tricky is knowing when to dispose of the underlying IEnumerator<T> - if you don't need to do that, it makes things simpler.
As a very sketchy attempt that I haven't even attempted to compile, and which is definitely not thread-safe, you could try something like this:
public class LazyEnumerable<T> : IEnumerable<T>
{
private readonly IEnumerator<T> iterator;
private List<T> buffer;
private bool completed = false;
public LazyEnumerable(IEnumerable<T> original)
{
// TODO: You could be even lazier, only calling
// GetEnumerator when you first need an element
iterator = original.GetEnumerator();
}
IEnumerator GetEnumerator() => GetEnumerator();
public IEnumerator<T> GetEnumerator()
{
int index = 0;
while (true)
{
// If we already have the element, yield it
if (index < buffer.Count)
{
yield return buffer[index];
}
// If we've yielded everything in the buffer and some
// other iterator has come to the end of the original,
// we're done.
else if (completed)
{
yield break;
}
// Otherwise, see if there's anything left in the original
// iterator.
else
{
bool hasNext = iterator.MoveNext();
if (hasNext)
{
var current = iterator.Current;
buffer.Add(current);
yield return current;
}
else
{
completed = true;
yield break;
}
}
index++;
}
}
}

Concurrent collection enumerator

I'm currently programming my own implementation of priority queue / sorted list and I would like to have it concurrent.
In order to have it thread safe I'm using lock(someObject) and I would like to verify some behavior of mutexes in C#.
Inner representation of my sorted list is basically linked list with head and slots linked together.
Something like:
internal class Slot
{
internal T Value;
internal Slot Next;
public Slot(T value, Slot next = null)
{
Value = value;
Next = next;
}
}
Every time I'm manipulating with head I have to use lock(someObject)because of thread safety.
In order to implement ICollection interface I have to implement public IEnumerator<T> GetEnumerator(). In this method I have take my head and read from it so I should use mutex.
public IEnumerator<T> GetEnumerator()
{
lock (syncLock)
{
var curr = head;
while (curr != null)
{
yield return curr.Value;
curr = curr.Next;
}
}
}
My question is: Is syncLock locked for whole time in enumerator (so it will be unlocked after reaching end of the method) or it is automatically unlocked after yielding value?
Thank you guys from the comments, here's sum up.
Answer: yes, syncLock will be locked for the whole time → hence, it's a really bad idea
Possible solution:
make collection not thread safe
obtain lock, copy whole collection and return enumerator of this collection #Evk
use some kind of boolean flag, set it on true while enumerating over the collection and throw exception when Add, Clear or Remove methods are called -> this is default List behavior #ManfredRadlwimmer
make that collection immutable #InBetween

Replace items in a list -- in place

I'm facing an issue with some simple C# code which I would easily fix in C/C++.
I guess I'm missing something.
I want to do the following (modifying items in a list -- in place):
//pseudocode
void modify<T>(List<T> a) {
foreach(var item in a) {
if(condition(item)) {
item = somethingElse;
}
}
}
I understand that foreach loops on a collection viewed as immutable, so the code above can't work.
I therefore tried the following :
void modify<T>(List<T> a) {
using (var sequenceEnum = a.GetEnumerator())
{
while (sequenceEnum.MoveNext())
{
var m = sequenceEnum.Current;
if(condition(m)) {
sequenceEnum.Current = somethingElse;
}
}
}
}
Naively thinking that Enumerator was some kind of pointer to my Element. Apparently enumerators are also immutable.
In C++ I would write something like that:
template<typename T>
struct Node {
T* value;
Node* next;
}
being then able to modify *value without touching anything in Node and therefore in the parent collection:
Node<T>* current = a->head;
while(current != nullptr) {
if(condition(current->value))
current->value = ...
}
current = current->next;
}
Do I really have to to unsafe code?
Or am i stuck the awfulness of calling subscript inside the loop?
You could also use a simple for loop.
void modify<T>(List<T> a)
{
for (int i = 0; i < a.Count; i++)
{
if (condition(a[i]))
{
a[i] = default(T);
}
}
}
In short - do not modify lists. You can achieve desired effect with
a = a.Select(x => <some logic> ? x : default(T)).ToList()
In general lists in C# are immutable during iteration. You can hovewer use .RemoveAll or similar methods.
As described in documentation here, System.Collections.Generic.List<T> is a generic implementation of the System.Collections.ArrayList which has O(1) complexity in indexed accessor. Very much like C++ std::vector<>, complexity of insertion/addition of elements is unpredictible, but access is time constant (complexity-wise, with respect to caching, etc).
The equivalent of your C++ code snippet would be LinkedList
As for the immutability of your collection during iteration, it is clearly stated in the documentation of GetEnumerator method here. Indeed, during enumeration (within a foreach, or using IEnumerator.MoveNext directly):
Enumerators can be used to read the data in the collection, but they
cannot be used to modify the underlying collection.
Moreover, modifying the list will invalidate the enumerator and usually throw an exception:
An enumerator remains valid as long as the collection remains
unchanged. If changes are made to the collection, such as adding,
modifying, or deleting elements, the enumerator is irrecoverably
invalidated and its behavior is undefined.
I believe this interface contract consistency between various types of collections leads to your misunderstanding: it would possible to implement a mutable list, but it is not required by the interfaces contract.
Imagine you want to implement a list that is mutable during enumeration. Would the enumerator hold a reference to (or a way to retrieve) the entry or the entry itself ? The entry itself would make it unmutable, a reference would be invalid when inserting elements in a linked list for example.
The simple for loop proposed by #Igor seems to be the best way to go if you want to use the standard Collections library. Otherwise, you may need to reimplement it yourself.
Use something like this:
List<T> GetModified<T>(List<T> list, Func<T, bool> condition, Func<T> replacement)
{
return list.Select(m => if (condition(m))
{ return m; }
else
{ return replacement(); }).ToList();
}
Usage:
originalList = GetModified(originalList, i => i.IsAwesome(), null);
But this can also get you into trouble with cross-thread operations. Try to use immutable instances where possible, especially with IEnumerable.
If you really really want to modify the instance of list:
//if you ever want to also remove items, this is magic (why I iterate backwards)
for (int i = list.Count - 1; i >= 0; i--)
{
if (condition)
{
list[i] = whateverYouWant;
}
}

how to avoid objects allocations? how to reuse objects instead of allocating them?

My financical software processes constantly almost the same objects. For example I have such data online:
HP 100 1
HP 100 2
HP 100.1 1
etc.
I've about 1000 updates every second.
Each update is stored in object - but i do not want to allocate these objects on the fly to improve latency.
I use objects only in short period of time - i recive them, apply and free. Once object is free it actually can be reused for another pack of data.
So I need some storage (likely ring-buffer) that allocates required number of objects once and them allow to "obtain" and "free" them. What is the best way to do that in c#?
Each object has id and i assign id's sequentially and free them sequentially too.
For example i receive id's 1 2 and 3, then I free 1, 2, 3. So any FIFO collection would work, but i'm looking for some library class that cover's required functionality.
I.e. I need FIFO collection that do not allocates objects, but reuse them and allows to reconfigure them.
upd
I've added my implementation of what I want. This is not tested code and probably has bugs.
Idea is simple. Writer should call Obtain Commit methods. Reader should call TryGet method. Reader and writer can access this structure from different threads:
public sealed class ArrayPool<T> where T : class
{
readonly T[] array;
private readonly uint MASK;
private volatile uint curWriteNum;
private volatile uint curReadNum;
public ArrayPool(uint length = 1024) // length must be power of 2
{
if (length <= 0) throw new ArgumentOutOfRangeException("length");
array = new T[length];
MASK = length - 1;
}
/// <summary>
/// TryGet() itself is not thread safe and should be called from one thread.
/// However TryGet() and Obtain/Commit can be called from different threads
/// </summary>
/// <returns></returns>
public T TryGet()
{
if (curReadNum == curWriteNum)
{
return null;
}
T result = array[curReadNum & MASK];
curReadNum++;
return result;
}
public T Obtain()
{
return array[curWriteNum & MASK];
}
public void Commit()
{
curWriteNum++;
}
}
Comments about my implementation are welcome and probably some library method can replace this simple class?
I don't think you should leap at this, as per my comments on the question - however, a simple approach would be something like:
public sealed class MicroPool<T> where T : class
{
readonly T[] array;
public MicroPool(int length = 10)
{
if (length <= 0) throw new ArgumentOutOfRangeException("length");
array = new T[length];
}
public T TryGet()
{
T item;
for (int i = 0; i < array.Length; i++)
{
if ((item = Interlocked.Exchange(ref array[i], null)) != null)
return item;
}
return null;
}
public void Recycle(T item)
{
if(item == null) return;
for (int i = 0; i < array.Length; i++)
{
if (Interlocked.CompareExchange(ref array[i], item, null) == null)
return;
}
using (item as IDisposable) { } // cleaup if needed
}
}
If the loads come in burst, you may be able to use the GC's latency modes to offset the overhead by delaying collects. This is not a silver bullet, but in some cases it can be very helpful.
I am not sure, if this is what you need, but you could always make a pool of objects that are going to be used. Initialize a List of the object type. Then when you need to use an object remove it from the list and add it back when you are done with it.
http://www.codeproject.com/Articles/20848/C-Object-Pooling Is a good start.
Hope I've helped even if a little :)
If you are just worried about the time taken for the GC to run, then don't be - it can't be beaten by anything you can do yourself.
However, if your objects' constructors do some work it might be quicker to cache them.
A fairly straightforward way to do this is to use a ConcurrentBag
Essentially what you do is to pre-populate it with a set of objects using ConcurrentBag.Add() (that is if you want - or you can start with it empty and let it grow).
Then when you need a new object you use ConcurrentBag.TryTake() to grab an object.
If TryTake() fails then you just create a new object and use that instead.
Regardless of whether you grabbed an object from the bag or created a new one, once you're done with it you just put that object back into the bag using ConcurrentBag.Add()
Generally your bag will get to a certain size but no larger (but you might want to instrument things just to check it).
In any case, I would always do some timings to see if changes like this actually make any difference. Unless the object constructors are doing a fair bit of work, chances are it won't make much difference.

Performance regarding cached thread-safe IEnumerable<T> implementation

I created the ThreadSafeCachedEnumerable<T> class intending to increase performance where long running queries where being reused. The idea was to get an enumerator from an IEnumerable<T> and add items to a cache on each call to MoveNext(). The following is my current implementation:
/// <summary>
/// Wraps an IEnumerable<T> and provides a thread-safe means of caching the values."/>
/// </summary>
/// <typeparam name="T"></typeparam>
class ThreadSafeCachedEnumerable<T> : IEnumerable<T>
{
// An enumerator from the original IEnumerable<T>
private IEnumerator<T> enumerator;
// The items we have already cached (from this.enumerator)
private IList<T> cachedItems = new List<T>();
public ThreadSafeCachedEnumerable(IEnumerable<T> enumerable)
{
this.enumerator = enumerable.GetEnumerator();
}
public IEnumerator<T> GetEnumerator()
{
// The index into the sequence
int currentIndex = 0;
// We will break with yield break
while (true)
{
// The currentIndex will never be decremented,
// so we can check without locking first
if (currentIndex < this.cachedItems.Count)
{
var current = this.cachedItems[currentIndex];
currentIndex += 1;
yield return current;
}
else
{
// If !(currentIndex < this.cachedItems.Count),
// we need to synchronize access to this.enumerator
lock (enumerator)
{
// See if we have more cached items ...
if (currentIndex < this.cachedItems.Count)
{
var current = this.cachedItems[currentIndex];
currentIndex += 1;
yield return current;
}
else
{
// ... otherwise, we'll need to get the next item from this.enumerator.MoveNext()
if (this.enumerator.MoveNext())
{
// capture the current item and cache it, then increment the currentIndex
var current = this.enumerator.Current;
this.cachedItems.Add(current);
currentIndex += 1;
yield return current;
}
else
{
// We reached the end of the enumerator - we're done
yield break;
}
}
}
}
}
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
I simply lock (this.enumerator) when the no more items appear to be in the cache, just in case another thread is just about to add another item (I assume that calling MoveNext() on this.enumerator from two threads is a bad idea).
The performance is great when retrieving previously cached items, but it starts to suffer when getting many items for the first time (due to the constant locking). Any suggestions for increasing the performance?
Edit: The new Reactive Framework solves the problem outlined above, using the System.Linq.EnumerableEx.MemoizeAll() extension method.
Internally, MemoizeAll() uses a System.Linq.EnumerableEx.MemoizeAllEnumerable<T> (found in the System.Interactive assembly), which is similar to my ThreadSafeCachedEnumerable<T> (sorta).
Here's an awfully contrived example that prints the contents of an Enumerable (numbers 1-10) very slowly, then quickly prints the contents a second time (because it cached the values):
// Create an Enumerable<int> containing numbers 1-10, using Thread.Sleep() to simulate work
var slowEnum = EnumerableEx.Generate(1, currentNum => (currentNum <= 10), currentNum => currentNum, previousNum => { Thread.Sleep(250); return previousNum + 1; });
// This decorates the slow enumerable with one that will cache each value.
var cachedEnum = slowEnum.MemoizeAll();
// Print the numbers
foreach (var num in cachedEnum.Repeat(2))
{
Console.WriteLine(num);
}
A couple of recommendations:
It is now generally accepted practice not to make container classes responsible for locking. Someone calling your cached enumerator, for instance, might also want to prevent new entries from being added to the container while enumerating, which means that locking would occur twice. Therefore, it's best to defer that responsibility to the caller.
Your caching depends on the enumerator always returning items in-order, which is not guaranteed. It's better to use a Dictionary or HashSet. Similarly, items may be removed inbetween calls, invalidating the cache.
It is generally not recommended to establish locks on publically accessible objects. That includes the wrapped enumerator. Exceptions are conceivable, for example when you're absolutely certain you're absolutely certain you're the only instance holding a reference to the container class you're enumerating over. This would also largely invalidate my objections under #2.
Locking in .NET is normally very quick (if there is no contention). Has profiling identified locking as the source of the performance problem? How long does it take to call MoveNext on the underlying enumerator?
Additionally, the code as it stands is not thread-safe. You cannot safely call this.cachedItems[currentIndex] on one thread (in if (currentIndex < this.cachedItems.Count)) while invoking this.cachedItems.Add(current) on another. From the List(T) documentation: "A List(T) can support multiple readers concurrently, as long as the collection is not modified." To be thread-safe, you would need to protect all access to this.cachedItems with a lock (if there's any chance that one or more threads could modify it).

Categories

Resources