Iterate and do operations on a generic collection concurrently - c#

I've created a game emulation program using c# async socks. I need to remove/add & do iterations on a collection (a list that holds clients) concurrently. I am currently using "lock", however, it's a a huge performance drop. I also do not want to use "local lists/copies" to keep the list up-to-date. I've heard about "ConcurrentBags", however, I am not sure how thread safe they are for iterations (for instance if a thread removes an element from the list while another thread is doing an iteration on it!?).
What do you suggest?
Edit: here is a situation
this is when a packet is sent to all the users in a room
lock (parent.gameClientList)
{
for (int i = 0; i <= parent.gameClientList.Count() - 1; i++) if (parent.gameClientList[i].zoneId == zoneId) parent.gameClientList[i].SendXt(packetElements); //if room matches - SendXt sends a packet
}
When a new client connects
Client connectedClient = new Client(socket, this);
lock (gameClientList)
{
gameClientList.Add(connectedClient);
}
Same case when a client disconnects.
I am asking for a better alternative (performance-wise) because the locks slow down everything.

It sounds like the problem is that you're doing all the work within your foreach loop, and it's locking out the add/remove methods for too long. The way around this is to quickly make a copy of the collection while it's locked, and then you can close the lock and iterate on the copy.
Thing[] copy;
lock(myLock) {
copy = _collection.ToArray();
}
foreach(var thing in copy) {...}
The drawback is that by the time you get around to operating on some object of that copy, it may have been removed from the original collection and so maybe you don't want to operate on it anymore. That's another thing you'll just have to figure out the requirements. If that's a problem, a simple option would be to lock each iteration of the loop, which of course will slow things down but at least it won't lock for the entire duration the loop is running:
foreac(var thing in copy) {
lock(myLock) {
if (_collection.Contains(thing)) //check that it's still in the original colleciton
DoWork(thing); //If you can move this outside the lock it'd make your app snappier, but it could also cause problems if you're doing something "dangerous" in DoWork.
}
}
If this is what you meant by "local copies", then you can disregard this option, but I figured I'd offer it in case you meant something else.

Every time you do something concurrently you are going to have loss due to task management (i.e. locks). I suggest you look at what is the bottleneck in your process. You seem to have a shared memory model, as opposed to a message passing model. If you know you need to modify the entire collection at once, there may not be a good solution. But if you are making changes in a particular order you can leverage that order to prevent delays. Locks is an implementation of pessimistic concurrency. You could switch to an optimistic concurrency model. In one the cost is waiting in the other the cost is retrying. Again the actual solution depends on your use case.

On problem with ConcurrentBag is that it is unordered so you cannot pull items out by index the same way you are doing it currently. However, you can iterate it via foreach to get the same effect. This iteration will be thread-safe. It will not go bizerk if an item is added or removed while the iteration is happening.
There is another problem with ConcurrentBag though. It actually copies the contents to a new List internally to make the enumerator work correctly. So even if you wanted to just pick off a single item via the enumerator it would still be a O(n) operation because of the way enumerator works. You can verify this by disassembling it.
However, based on context clues from your update I assume that this collection is going to be small. It appears that there is only one entry per "game client" which means it is probably going to store a small number of items right? If that is correct then the performance of the GetEnumerator method will be mostly insignificant.
You should also consider ConcurrentDictionary as well. I noticed that you are trying to match items from the collection based on zoneId. If you store the items in the ConcurrentDictionary keyed by zoneId then you would not need to iterate the collection at all. Of course, this assumes that there is only one entry per zoneId which may not be the case.
You mentioned that you did not want to use "local lists/copies", but you never said why. I think you should reconsider this for the following reasons.
Iterations could be lock-free.
Adding and removing items appears to be infrequent based context clues from your code.
There are a couple of patterns you can use to make the list copying strategy work really well. I talk about them in my answers here and here.

Related

C# Safely using LINQ across threads

I have a program that is constantly reading and parsing a large stream of data from a WebSocket. All of the parsing happens on one thread within the client, and the data is organized into a SortedSet<T> tree for fast operation.
All of the data is added, updated, and removed without a hitch.
The problem comes when I try to access the data from another thread. It will run fine, but somewhere along the lines is a race condition that will be hit within a minute or two.
Consider this code (running on its own thread) to update the UI in near real-time:
private async Task RenderOrderBook()
{
var book = _client.OrderBook;
while (true)
{
try
{
var asks = book.Asks.OrderBy(i => i.Price).Take(5).OrderByDescending(i => i.Price);
var bids = book.Bids.OrderByDescending(i => i.Price).Take(5);
orderBookView.BeginInvoke(new MethodInvoker(() =>
{
...omitted due to irrelevance
}));
await Task.Delay(500);
}
catch (Exception ex)
{
ex.ToString();
}
}
}
The race condition lies within the LINQ operations on book. The common error is that i.Price (a decimal variable), or perhaps just the object i is referring to, is null. Additionally, my shoddy attempt to just swallow the exception does not actually work.
Regardless, my guess is that the data is being parsed and manipulated so fast that eventually, when using the LINQ OrderBy operation, it will hit a case where a node has been removed by the client, attempt to read from it, and throw an exception.
The book.Asks and book.Bids properties were initially of type SortedSet<T> and pointed directly to the data member itself. In an attempt to mitigate this race condition scenario, I attempted to change them to an array of the node, and use a _asks.ToArray() call to essentially make a copy to read from. This helped make the problem occur a bit less frequently, but nonetheless it still does happen.
How can I make this thread-safe?
Additional Code Snippets
public PriceNode[] Asks
{
get { return _asks.ToArray(); }
}
public PriceNode[] Bids
{
get { return _bids.ToArray(); }
}
My first rule of UI development is that you never perform I/O on the UI thread. Sounds like you've got that one covered.
My second rule is that once something is visible to the UI thread, you can't touch it from any other thread. There is exactly one exception to this rule, and that is for immutable data: if an object will not change, then any thread can touch it. Mutable data? No touch. Keep in mind that "mutable data" includes most collections.
Your life will be so much easier if you can follow these two rules. Following one without breaking the other can be tricky, but there are ways to do it, and once you have a decent grip of them, you'll be in a better place. The path to enlightenment begins here:
Your read thread (the thread reading off the socket) is allowed to create all the new objects it wants, but it can't update existing objects. It also can't modify any collections that the UI thread is using. If you're only adding new objects, this isn't so bad: your read thread can pull data off the socket and use it to cook up new objects. When those objects are ready, it has to hand them over to the UI thread, and the UI thread can add them to the relevant collections. The bulk of the work (and all of the I/O) happens on the read thread, which is what we want, per Strobel's Rule #1. The act of "committing" the already-populated objects should be trivial by comparison. Per Rule #2, once any mutable objects get handed off to the UI thread, your read thread can't touch them again. Ever.
Updating existing objects is trickier. There's a couple ways you can approach this. One is to have the read thread use the latest data to create new objects, which it then hands off to the UI thread. If you have very simple object graphs, the easiest option might be to simply replace the old objects with their newer versions, keeping in mind that any UI code referencing an old object will need to know that it's been replaced. Alternatively, the UI thread can use the data from the new object to update the existing object. If you're following Rule #2, this will be totally thread-safe, and any UI code that pointed to the old object automatically sees the new data without any torn reads or other race-related nastiness. This approach is probably your best bet.
If, after trying out the approaches in the previous paragraph, you find that you are generating unacceptable amounts of garbage, there is a third option. The read thread can copy the raw data for each object into a temporary buffer, then hand the buffers over to the UI thread, which can use the data in the buffers to update the existing objects. This means more work occurring on the UI thread, but at least the data is already in memory (the socket I/O is already done). Since the point of this approach is to create less garbage, it only makes sense if you reuse the buffers. That means you need a thread-safe buffer pool. The read thread acquires a temporary buffer, fills it from the socket, hands it to the UI thread, which returns it to the pool when it's done. Astute readers will note that passing mutable buffers between threads bumps up against Rule #2, so take care that once a thread hands over a buffer, it immediately forgets about it. Because this approach requires a stronger grasp of thread safety to make the pool work, I recommend it only as a last resort. If you can get away with one of the options in the previous paragraph, please do so.
Regardless of which approach you use for updating existing objects, you'll need a way to match up the new objects/data with the old objects. If each object has a unique identifier, you can use a Dictionary<,> as an efficient lookup mechanism. Replacing old objects with their newer copies is a bit more involved, because the old versions may be scattered across multiple collections, some of which may not support efficient replacement.
One last thing: when you hand over new/updated objects to the UI thread, it is vastly preferable to do it in batches. For example, you're better off posting a single operation to your UI thread to update 100 objects than posting 100 separate operations that each update one object.

C# Threading without locking Producer or Consumer

TLDR; version of the main questions:
While working with threads, is it safe to read a list's contents with 1 thread, while another write to it, as long you do not delete list contents (reoganize order) and only reads new object after the new object is added fully
While an Int is being updated from "Old Value" to "New Value" by one thread, is there is a risk, if another thread reads this Int that the value returned is neither "Old Value" or "New Value"
Is it possible for a thread to "skip" a critical region if its busy, instead of just going to sleep and wait for the regions release?
I have 2 pieces of code running in seperate threads and I want to have the one act as a producer for the other. I do not want either thread "sleeping" while waiting for access, but instead skip forward in their internal code if the other thread is accessing this.
My original plan were to share the data via this approach (and once counter got high enough switch to a secondary list to avoid overflows).
pseudo code of flow as I original intended it.
Producer
{
Int counterProducer;
bufferedObject newlyProducedObject;
List <buffered_Object> objectsProducer;
while(true)
{
<Do stuff until a new product is created and added to newlyProducedObject>;
objectsProducer.add(newlyProducedObject_Object);
counterProducer++
}
}
Consumer
{
Int counterConsumer;
Producer objectProducer; (contains reference to Producer class)
List <buffered_Object> personalQueue
while(true)
<Do useful work, such as working on personal queue, and polish nails if no personal queue>
//get all outstanding requests and move to personal queue
while (counterConsumer < objectProducer.GetcounterProducer())
{
personalQueue.add(objectProducer.GetItem(counterconsumer+1));
counterConsumer++;
}
}
Looking at this, everything looked fine at first glance, I knew I would not be retrieving a half constructed product from the queue, so the status of the list regardless of where it is should not be a problem even if a thread switch occour while the Producer is adding a new object. Is this assumption correct, or can there be problems here? (my guess is as the consumer is asking for a specific location in the list and new objects are added to the end, and objects are never deleted that this will not be a problem)
But what caught my eye was, could a similar problem occour that "counterProducer" is at an unknown value while it is being "counterProducer++"? Could this result in the value temporary be "null" or some unknown value? Will this be a potential issue?
My goal is to have neither of the two threads lock while waiting for a mutex but instead continue their loops, which is why I made the above first, as there is no locking.
If the usage of the list will cause problems, my workaround will be to make a linked list implementation, and share it between the two classes, still use the counters to see if new work has been added and keep last location while the personalQueue moves new stuff to personal queue. So producer add new links, consumer reads them, and deletes previous. (no counter on the list, just external counters to know how much has been added and removed)
alternative pseudo code to avoid the counterConsumer++ risk (need help with this).
Producer
{
Int publicCounterProducer;
Int privateCounterProducer;
bufferedObject newlyProducedObject;
List <buffered_Object> objectsProducer;
while(true)
{
<Do stuff until a new product is created and added to newlyProducedObject>;
objectsProducer.add(newlyProducedObject_Object);
privateCounterProducer++
<Need Help: Some code that updates the publicCounterProducer to the privateCounterProducer if that variable is not
locked, else skips ahead, and the counter will get updated at next pass, at some point the consumer must be done reading stuff, and
new stuff is prepared already>
}
}
Consumer
{
Int counterConsumer;
Producer objectProducer; (contains reference to Producer class)
List <buffered_Object> personalQueue
while(true)
<Do useful work, such as working on personal queue, and polish nails if no personal queue>
//get all outstanding requests and move to personal queue
<Need Help: tries to read the publicProducerCounter and set readProducerCounter to this, else skips this code>
while (counterConsumer < readProducerCounter)
{
personalQueue.add(objectProducer.GetItem(counterconsumer+1));
counterConsumer++;
}
}
So goal in the 2nd part of code, and I have not been able to figure out how to code this, is to make both classes not wait for the other in case the other is in the "critical region" of updating the publicCounterProducer. If I read the lock functionality correct, the threads will go to sleep waiting for the release, which is not what I want. Might end up with having to use it though, in which case, first pseudocode would do it, and just set a "lock" on the getting of the value.
Hope you can help me out with my many questions.
No it is not safe. A context switch can occur within .Add after List has added the object, but before List has updated the internal data structure.
If it is int32, or if it is int64 and you are running in an x64 process, then there is no risk. But if you have any doubts, use the Interlocked class.
Yes, you can use a Semaphore, and when it is time to enter the critical region, use WaitOne overload that takes a timeout. Pass a timeout of 0. If WaitOne returns true, then you successfully acquired the lock and can enter. If it returns false, then you did not acquire the lock and should not enter.
You should really look at the System.Collections.Concurrent namespace. In particular, look at the BlockingCollection. It has a bunch of Try* operators you can use to add/remove items from the collection without blocking.
While working with threads, is it safe to read a list's contents with 1 thread, while another write to it, as long you do not delete list contents (reoganize order) and only reads new object after the new object is added fully
No, it is not. A side-effect of adding an item to a list may be to reallocate its underlying array. Current implementations of List<T> update the internal reference before copying the old data to it, so multiple threads may observe a list of the correct size but containing no data.
While an Int is being updated from "Old Value" to "New Value" by one thread, is there is a risk, if another thread reads this Int that the value returned is neither "Old Value" or "New Value"
Nope, int updates are atomic. But if two threads are both incrementing counterProducer at once, it will go wrong. You should use Interlocked.Increment() to increment it.
Is it possible for a thread to "skip" a critical region if its busy, instead of just going to sleep and wait for the regions release?
No, but you can use (for example) WaitHandle.WaitOne(int) to see if a wait succeeded, and branch accordingly. WaitHandle is implemented by several synchronization classes, such as ManualResetEvent.
Incidentally, is there a reason you are not using the built-in Producer/Consumer classes such as BlockingCollection<T>? BlockingCollection is easy to use (after you read the documentation!) and I'd recommend using it instead.

List<T>: Moving from immutable to changeable structure

I am currently using quite heavily some List and I am looping very frequently via foreach over these lists.
Originally List was immuteable afer the startup. Now I have a requirement to amend the List during runtime from one thread only (a kind of listener). I need to remove from the List in object A and add to the list of object B. A and B are instances of the same class.
Unfortunaly there is no Synchronized List. What would you suggest me to do in this case? in my case speed is more important than synchronisation, thus I am currently working with copies of the lists for add/remove to avoid that the enumerators fail.
Do you have any other recommended way to deal with this?
class X {
List<T> Related {get; set;}
}
In several places and in different threads I am then using
foreach var x in X.Related
Now I need to basically perform in yet another thread
a.Related.Remove(t);
b.Related.Add(t);
To avoid potential exceptions, I am currently doing this:
List<T> aNew=new List<T> (a.Related);
aNew.Remove(t);
a.Related=aNew;
List<T>bNew=new List<T>(b.Related){t};
b.Related=bNew;
Is this correct to avoid exceptions?
From this MSDN post: http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx
"...the only way to ensure thread safety is to lock the collection during the entire enumeration. "
Consider using for loops and iterate over your collection in reverse. This way you do not have the "enumerators fail", and as you are going backwards over your collection it is consistent from the POV of the loop.
It's hard to discuss the threading aspects as there is limited detail.
Update
If your collections are small, and you only have 3-4 "potential" concurrent users, I would suggest using a plain locking strategy as suggested by #Jalal although you would need to iterate backwards, e.g.
private readonly object _syncObj = new object();
lock (_syncObj)
{
for (int i = list.Count - 1; i >= 0; i--)
{
//remove from the list and add to the second one.
}
}
You need to protect all accesses to your lists with these lock blocks.
Your current implementation uses the COW (Copy-On-Write) strategy, which can be effective in some scenarios, but your particular implementation suffers from the fact that two or more threads take a copy, make their changes, but then could potentially overwrite the results of other threads.
Update
Further to your question comment, if you are guaranteed to only have one thread updating the collections, then your use of COW is valid, as there is no chance of multiple threads making updates and updates being lost by overwriting by multiple threads. It's a good use of the COW strategy to achieve lock free synchronization.
If you bring other threads in to update the collections, my previous locking comments stand.
My only concern would be that the other "reader" threads may have cached values for the addresses of the original lists, and may not see the new addresses when they are updated. In this case make the list variables volatile.
Update
If you do go for the lock-free strategy there is still one more pitfall, there will still be a gap between setting a.Related and b.Related, in which case your reader threads could be iterating over out-of-date collections e.g. item a could have been removed from list1 but not yet added to list2 - item a will be in neither lists. You could also swap the issue around and add to list2 before removing from list1, in which case item a would be in both lists - duplicates.
If consistency is important you should use locking.
You should lock before you handle the lists since you are in multithreading mode, the lock operation itself does not affect the speed here, the lock operation is executed in nanoseconds about 10 ns depending on the machine. So:
private readonly object _listLocker = new object();
lock (_listLocker)
{
for (int itemIndex = 0; itemIndex < list.Count; itemIndex++)
{
//remove from the first list and add to the second one.
}
}
If you are using framework 4.0 I encorage you to use ConcurrentBag instead of list.
Edit: code snippet:
List<T> aNew=new List<T> (a.Related);
This will work if only all interaction with the collection "including add remove replace items" managed this way. Also you have to use System.Threading.Interlocked.CompareExchange and System.Threading.Interlocked.Exchange methods to replace the existing collection with the new modified. if that is not the case then you are doing nothing by coping
This will not work. for instance consider a thread trying to get an item from the collection, at the same time another thread replace the collection. this could leave the item retrieved in a not constant data. also consider while you are coping the collection, another thread want to insert item to the collection at the same time while you are coping?
This will throw exception indicates that the collection modified.
Another thing is that you are coping the whole collection to a new list to handle it. certainly this will harm the performance, and I think using synchronization mechanism such as lock reduce the performance pallet, and it is the much appropriated thing to do while to handle multithreading scenarios.

How to speed up routines making use of collections in multithreading scenario

I've an application that makes use of parallelization for processing data.
The main program is in C#, while one of the routine for analyzing data is on an external C++ dll. This library scans data and calls a callback everytime a certain signal is found within the data. Data should be collected, sorted and then stored into HD.
Here is my first simple implementation of the method invoked by the callback and of the method for sorting and storing data:
// collection where saving found signals
List<MySignal> mySignalList = new List<MySignal>();
// method invoked by the callback
private void Collect(int type, long time)
{
lock(locker) { mySignalList.Add(new MySignal(type, time)); }
}
// store signals to disk
private void Store()
{
// sort the signals
mySignalList.Sort();
// file is a object that manages the writing of data to a FileStream
file.Write(mySignalList.ToArray());
}
Data is made up of a bidimensional array (short[][] data) of size 10000 x n, with n variable. I use parallelization in this way:
Parallel.For(0, 10000, (int i) =>
{
// wrapper for the external c++ dll
ProcessData(data[i]);
}
Now for each of the 10000 arrays I estimate that 0 to 4 callbacks could be fired. I'm facing a bottleneck and given that my CPU resources are not over-utilized, I suppose that the lock (together with thousand of callbacks) is the problem (am I right or there could be something else?). I've tried the ConcurrentBag collection but performances are still worse (in line with other user findings).
I thought that a possible solution for use lock-free code would be to have multiple collections. Then it would be necessary a strategy to make each thread of the parallel process working on a single collection. Collections could be for instance inside a dictionary with thread ID as key, but I do not know any .NET facility for this (I should know the threads ID for initialize the dictionary before launching the parallelization). Could be this idea feasible and, in case yes, does exist some .NET tool for this? Or alternatively, any other idea to speed up the process?
[EDIT]
I've followed the Reed Copsey's suggestion and I used the following solution (according to the profiler of VS2010, before the burden for locking and adding to the list was taking 15% of the resources, while now only 1%):
// master collection where saving found signals
List<MySignal> mySignalList = new List<MySignal>();
// thread-local storage of data (each thread is working on its List<MySignal>)
ThreadLocal<List<MySignal>> threadLocal;
// analyze data
private void AnalizeData()
{
using(threadLocal = new ThreadLocal<List<MySignal>>(() =>
{ return new List<MySignal>(); }))
{
Parallel.For<int>(0, 10000,
() =>
{ return 0;},
(i, loopState, localState) =>
{
// wrapper for the external c++ dll
ProcessData(data[i]);
return 0;
},
(localState) =>
{
lock(this)
{
// add thread-local lists to the master collection
mySignalList.AddRange(local.Value);
local.Value.Clear();
}
});
}
}
// method invoked by the callback
private void Collect(int type, long time)
{
local.Value.Add(new MySignal(type, time));
}
thought that a possible solution for use lock-free code would be to have multiple collections. Then it would be necessary a strategy to make each thread of the parallel process working on a single collection. Collections could be for instance inside a dictionary with thread ID as key, but I do not know any .NET facility for this (I should know the threads ID for initialize the dictionary before launching the parallelization). Could be this idea feasible and, in case yes, does exist some .NET tool for this? Or alternatively, any other idea to speed up the process?
You might want to look at using ThreadLocal<T> to hold your collections. This automatically allocates a separate collection per thread.
That being said, there are overloads of Parallel.For which work with local state, and have a collection pass at the end. This, potentially, would allow you to spawn your ProcessData wrapper, where each loop body was working on its own collection, and then recombine at the end. This would, potentially, eliminate the need for locking (since each thread is working on it's own data set) until the recombination phase, which happens once per thread (instead of once per task,ie: 10000 times). This could reduce the number of locks you're taking from ~25000 (0-4*10000) down to a few (system and algorithm dependent, but on a quad core system, probably around 10 in my experience).
For details, see my blog post on aggregating data with Parallel.For/ForEach. It demonstrates the overloads and explains how they work in more detail.
You don't say how much of a "bottleneck" you're encountering. But let's look at the locks.
On my machine (quad core, 2.4 GHz), a lock costs about 70 nanoseconds if it's not contended. I don't know how long it takes to add an item to a list, but I can't imagine that it takes more than a few microseconds. But let's it takes 100 microseconds (I would be very surprised to find that it's even 10 microseconds) to add an item to the list, taking into account lock contention. So if you're adding 40,000 items to the list, that's 4,000,000 microseconds, or 4 seconds. And I would expect one core to be pegged if this were the case.
I haven't used ConcurrentBag, but I've found the performance of BlockingCollection to be very good.
I suspect, though, that your bottleneck is somewhere else. Have you done any profiling?
The basic collections in C# aren't thread safe.
The problem you're having is due to the fact that you're locking the entire collection just to call an add() method.
You could create a thread-safe collection that only locks single elements inside the collection, instead of the whole collection.
Lets look at a linked list for example.
Implement an add(item (or list)) method that does the following:
Lock collection.
A = get last item.
set last item reference to the new item (or last item in new list).
lock last item (A).
unclock collection.
add new items/list to the end of A.
unlock locked item.
This will lock the whole collection for just 3 simple tasks when adding.
Then when iterating over the list, just do a trylock() on each object. if it's locked, wait for the lock to be free (that way you're sure that the add() finished).
In C# you can do an empty lock() block on the object as a trylock().
So now you can add safely and still iterate over the list at the same time.
Similar solutions can be implemented for the other commands if needed.
Any built-in solution for a collection is going to involve some locking. There may be ways to avoid it, perhaps by segregating the actual data constructs being read/written, but you're going to have to lock SOMEWHERE.
Also, understand that Parallel.For() will use the thread pool. While simple to implement, you lose fine-grained control over creation/destruction of threads, and the thread pool involves some serious overhead when starting up a big parallel task.
From a conceptual standpoint, I would try two things in tandem to speed up this algorithm:
Create threads yourself, using the Thread class. This frees you from the scheduling slowdowns of the thread pool; a thread starts processing (or waiting for CPU time) when you tell it to start, instead of the thread pool feeding requests for threads into its internal workings at its own pace. You should be aware of the number of threads you have going at once; the rule of thumb is that the benefits of multithreading are overcome by the overhead when you have more than twice the number of active threads as "execution units" available to execute threads. However, you should be able to architect a system that takes this into account relatively simply.
Segregate the collection of results, by creating a dictionary of collections of results. Each results collection is keyed to some token carried by the thread doing the processing and passed to the callback. The dictionary can have multiple elements READ at one time without locking, and as each thread is WRITING to a different collection within the Dictionary there shouldn't be a need to lock those lists (and even if you did lock them you wouldn't be blocking other threads). The result is that the only collection that has to be locked such that it would block threads is the main dictionary, when a new collection for a new thread is added to it. That shouldn't have to happen often if you're smart about recycling tokens.

Safe to get Count value from generic collection without locking the collection?

I have two threads, a producer thread that places objects into a generic List collection and a consumer thread that pulls those objects out of the same generic List. I've got the reads and writes to the collection properly synchronized using the lock keyword, and everything is working fine.
What I want to know is if it is ok to access the Count property without first locking the collection.
JaredPar refers to the Count property in his blog as a decision procedure that can lead to race conditions, like this:
if (list.Count > 0)
{
return list[0];
}
If the list has one item and that item is removed after the Count property is accessed but before the indexer, an exception will occur. I get that.
But would it be ok to use the Count property to, say, determine the initial size a completely different collection? The MSDN documentation says that instance members are not guaranteed to be thread safe, so should I just lock the collection before accessing the Count property?
I suspect it's "safe" in terms of "it's not going to cause anything to go catastrophically wrong" - but that you may get stale data. That's because I suspect it's just held in a simple variable, and that that's likely to be the case in the future. That's not the same as a guarantee though.
Personally I'd keep it simple: if you're accessing shared mutable data, only do so in a lock (using the same lock for the same data). Lock-free programming is all very well if you've got appropriate isolation in place (so you know you've got appropriate memory barriers, and you know that you'll never be modifying it in one thread while you're reading from it in another) but it sounds like that isn't the case here.
The good news is that acquiring an uncontested lock is incredibly cheap - so I'd go for the safe route if I were you. Threading is hard enough without introducing race conditions which are likely to give no significant performance benefit but at the cost of rare and unreproducible bugs.

Categories

Resources