How to do proper Parallel.ForEach, locking and progress reporting

How to do proper Parallel.ForEach, locking and progress reporting - c#

I'm trying to implement the Parallel.ForEach pattern and track progress, but I'm missing something regarding locking. The following example counts to 1000 when the threadCount = 1, but not when the threadCount > 1. What is the correct way to do this?
class Program
{
static void Main()
{
var progress = new Progress();
var ids = Enumerable.Range(1, 10000);
var threadCount = 2;
Parallel.ForEach(ids, new ParallelOptions { MaxDegreeOfParallelism = threadCount }, id => { progress.CurrentCount++; });
Console.WriteLine("Threads: {0}, Count: {1}", threadCount, progress.CurrentCount);
Console.ReadKey();
}
}
internal class Progress
{
private Object _lock = new Object();
private int _currentCount;
public int CurrentCount
{
get
{
lock (_lock)
{
return _currentCount;
}
}
set
{
lock (_lock)
{
_currentCount = value;
}
}
}
}

The usual problem with calling something like count++ from multiple threads (which share the count variable) is that this sequence of events can happen:
Thread A reads the value of count.
Thread B reads the value of count.
Thread A increments its local copy.
Thread B increments its local copy.
Thread A writes the incremented value back to count.
Thread B writes the incremented value back to count.
This way, the value written by thread A is overwritten by thread B, so the value is actually incremented only once.
Your code adds locks around operations 1, 2 (get) and 5, 6 (set), but that does nothing to prevent the problematic sequence of events.
What you need to do is to lock the whole operation, so that while thread A is incrementing the value, thread B can't access it at all:
lock (progressLock)
{
progress.CurrentCount++;
}
If you know that you will only need incrementing, you could create a method on Progress that encapsulates this.

Old question, but I think there is a better answer.
You can report progress using Interlocked.Increment(ref progress) that way you do not have to worry about locking the write operation to progress.

The easiest solution would actually have been to replace the property with a field, and
lock { ++progress.CurrentCount; }
(I personally prefer the look of the preincrement over the postincrement, as the "++." thing clashes in my mind! But the postincrement would of course work the same.)
This would have the additional benefit of decreasing overhead and contention, since updating a field is faster than calling a method that updates a field.
Of course, encapsulating it as a property can have advantages too. IMO, since field and property syntax is identical, the ONLY advantage of using a property over a field, when the property is autoimplemented or equivalent, is when you have a scenario where you may want to deploy one assembly without having to build and deploy dependent assemblies anew. Otherwise, you may as well use faster fields! If the need arises to check a value or add a side effect, you simply convert the field to a property and build again. Therefore, in many practical cases, there is no penalty to using a field.
However, we are living in a time where many development teams operate dogmatically, and use tools like StyleCop to enforce their dogmatism. Such tools, unlike coders, are not smart enough to judge when using a field is acceptable, so invariably the "rule that is simple enough for even StyleCop to check" becomes "encapsulate fields as properties", "don't use public fields" et cetera...

Remove lock statements from properties and modify Main body:
object sync = new object();
Parallel.ForEach(ids, new ParallelOptions {MaxDegreeOfParallelism = threadCount},
id =>
{
lock(sync)
progress.CurrentCount++;
});

The issue here is that ++ is not atomic - one thread can read and increment the value between another thread reading the value and it storing the (now incorrect) incremented value. This is probably compounded by the fact there's a property wrapping your int.
e.g.
Thread 1 Thread 2
reads 5 .
. reads 5
. writes 6
writes 6! .
The locks around the setter and getter don't help this, as there's nothing to stop the lock blocks themseves being called out of order.
Ordinarily, I'd suggest using Interlocked.Increment, but you can't use this with a property.
Instead, you could expose _lock and have the lock block be around the progress.CurrentCount++; call.

It is better to store any database or file system operation in a local buffer variable instead of locking it. locking reduces performance.

Related

Thread Safety With Parallel Operations

Before I start, I should mention that I feel like I've got the wrong end of the stick here. But here we go anyway:
Imagine we have the following class:
public class SomeObject {
public int SomeInt;
private SomeObject anotherObject;
public void DoStuff() {
if (SomeCondition()) anotherObject.SomeInt += 1;
}
}
Now, imagine that we have a collection of these SomeObjects:
IList<SomeObject> allObjects = new List<SomeObject>(1000);
// ... Pretend the list is populated with 1000 SomeObjects here
Let's say I call DoStuff() on each one, like so:
foreach (var #object in allObjects) #object.DoStuff();
All is good so far.
Now, let's assume that the order in which the objects have their DoStuff() called is not important. Assume that SomeCondition() is computationally expensive, perhaps. I could utilize all four cores on my machine (and potentially get a performance gain) with:
Parallel.For(0, 1000, i => allObjects[i].DoStuff());
Now, ignoring any issues with atomicity of variable access, I don't care whilst I am in the loop whether or not any given SomeObject sees an outdated version of anotherObject or SomeInt.* However, once the loop is done, I want to make sure that my main worker thread (i.e. the one that called Parallel.For) DOES see everything up-to-date.
Is there a guarantee of this (e.g. some sort of memory barrier?) with using Parallel.For? Or do I need to make some sort of guarantee myself? Or is there no way to make this guarantee?
Finally, if I call Parallel.For(...) again in the same way just after, will all worker threads be working with the new, up-to-date values for everything?
(*) The implementers of DoStuff() would be wrong to make assumptions about the order of processing anyway, right?

var locker = new object();
var total = 0.0;
Parallel.For(1, 10000000,
i => { lock (locker) total += (i + 1); });
Console.WriteLine("WithLocker" + total);
var total2 = 0.0;
Parallel.For(1, 10000000,
i => total2 += (i + 1));
Console.WriteLine("WithoutLocker" + total2);
Console.ReadKey();
// WithLocker 50000004999999
// WithoutLocker 28861729333278
I have made for you two example one with locker and one without look to the result!

There are two issues here.
However, once the loop is done, I want to make sure that my main worker thread (i.e. the one that called Parallel.For) DOES see everything up-to-date.
To answer your question. Yes, once your Parallel.For has completed all the calls to DoStuff will have completed and your array will not see any more updates.
Now, ignoring any issues with atomicity of variable access, I don't care whilst I am in the loop whether or not any given SomeObject sees an outdated version of anotherObject or SomeInt.*
I really doubt that you don't care about this if you want a correct answer. Bassam's answer addresses the potential data races in your code. If one thread is running DoSomething and this writes to another index in the array which is simultaneously being read by another thread then you will see nondeterministic results. Locking can solve this (as shown above) but at the expense of performance. Locking on every thread for every update effectively serializes your work. I suspect that Bassam's lock example actually runs no faster and possibly slower that the non-locking one, although it does produce the correct answer.
If SomeObject::anotherObject refers to anything other than this you have a potential race condition. Consider the case where anotherObject refers to the element in the array adjacent to the current object. What happens when these run concurrently? One thread's code will be trying to read an instance of SomeObject while another thread writes to it. The write not guaranteed to happen atomically, your read my return an object in a half written state.
This depends a bit on what is being updated in SomeObject and how it's being updated. For example if all you are doing is incrementing an single integer value you could use Interlocked Operations to increment the value in a thread safe way or use critical sections or locks to ensure that your SomeObject is actually thread safe. Adding synchronization operations usually impacts performance so if possible I would recommend looking for an approach that does not require adding synchronization.
You can fix this in one of two ways.
1) If each instance of anotherObject in the array is guaranteed to be only updated once by one call to allObjects[i].DoStuff() then you can modify your code to have an input and output array. This prevents any race conditions as reads and writes no longer conflict. It means you need two copies of your array and they both need to be initialized.
2) If you are updating array items multiple times, or having two arrays of SomeObject is not an option and SomeCondition() is the only computationally expensive part of your method then you could parallelize this and then update the array sequentially.
IList<bool> allConditions = new List<bool>(1000);
Parallel.For(0, 1000, i => SomeCondition(i)) // Write allConditions not allObjects
for (int i = 0; i < 1000; ++i) { #object.DoStuff(allConditions[i]); }
So your observation:
This is interesting. It means that Parallel.For is basically only useful for code that's already thread-safe... Damn
Is not entirely correct. The code within your Parallel.For must either be thread safe or not access data and resources in a non-thread safe way. In other words it doesn't have to lock if you can rearrange your code to guarantee that there are no race conditions (or deadlocks) because none of the threads write the same data or will read data that another thread may be writing to. Note that concurrent reads are OK.

Parallel.For not handling lock properly

I've done the following test:
private static object threadLocker = new object();
private static long threadStaticVar;
public static long ThreadStaticVar
{
get
{
lock (threadLocker)
{
return threadStaticVar;
}
}
set
{
lock (threadLocker)
{
threadStaticVar = value;
}
}
}
Parallel.For(0, 20000, (x) =>
{
//lock (threadLocker) // works with this lock
//{
ThreadStaticVar++;
//}
});
This Parallel.For invokes the method passing the values from 0 to 19999. So it would execute 20k times.
If I don't wrap ThreadStaticVar++; with a lock, even though it has a lock on its get and set, the result will not be 20000. If I remove the comment bars and lock it inside the .For it gets the right value.
My question is: How does it work? Why the lock on the get and set doesn't work? Why it works only inside my For?

The ++ operator isn't an atomic increment. There will be a call to get followed by a call to set, and those calls can be interleaved among different threads since the lock is only on each individual operation. Think of it like this:
lock {tmp = var}
lock {var = tmp+1}
Those locks don't look so effective now, do they?

In your example ThreadStaricVar++ is not an atomic operation.
More accurately, ++ is not an atomic operation as it locks your getter, then increment the value, and then locks your setter to set the value. Between these two anything can happen :)
To do it properly I would recommend to use object-oriented programming instead of this procedural code. Just implement an Increment() method in your object and make it responsible to lock and do ++ inside this method. In your parallel loop you just command your object what to do, now it this object's responsibility to make it happen and figure out how to do it.
So you just implement your lock within the Increment() method and have no problems anywhere outside (really, consumers shouldn't know and shouldn't even think about such issues).

You can rename threadStaticVar and make it public . Then, use Interlocked.Increment.
However, also consider whether a parallel for is appropriate. Even if the real code is more complex, running in parallel with locking may not be your best option.

How to easy make this counter property thread safe?

I have property definition in class where i have only Counters, this must be thread-safe and this isn't because get and set is not in same lock, How to do that?
private int _DoneCounter;
public int DoneCounter
{
get
{
return _DoneCounter;
}
set
{
lock (sync)
{
_DoneCounter = value;
}
}
}

If you're looking to implement the property in such a way that DoneCounter = DoneCounter + 1 is guaranteed not to be subject to race conditions, it can't be done in the property's implementation. That operation is not atomic, it actually three distinct steps:
Retrieve the value of DoneCounter.
Add 1
Store the result in DoneCounter.
You have to guard against the possibility that a context switch could happen in between any of those steps. Locking inside the getter or setter won't help, because that lock's scope exists entirely within one of the steps (either 1 or 3). If you want to make sure all three steps happen together without being interrupted, then your synchronization has to cover all three steps. Which means it has to happen in a context that contains all three of them. That's probably going to end up being code that does not belong to whatever class contains the DoneCounter property.
It is the responsibility of the person using your object to take care of thread safety. In general, no class that has read/write fields or properties can be made "thread-safe" in this manner. However, if you can change the class's interface so that setters aren't necessary, then it is possible to make it more thread-safe. For example, if you know that DoneCounter only increments and decrements, then you could re-implement it like so:
private int _doneCounter;
public int DoneCounter { get { return _doneCounter; } }
public int IncrementDoneCounter() { return Interlocked.Increment(ref _doneCounter); }
public int DecrementDoneCounter() { return Interlocked.Decrement(ref _doneCounter); }

Using the Interlocked class provides for atomic operations, i.e. inherently threadsafe as in this LinqPad example:
void Main()
{
var counters = new Counters();
counters.DoneCounter += 34;
var val = counters.DoneCounter;
val.Dump(); // 34
}
public class Counters
{
int doneCounter = 0;
public int DoneCounter
{
get { return Interlocked.CompareExchange(ref doneCounter, 0, 0); }
set { Interlocked.Exchange(ref doneCounter, value); }
}
}

If you're expecting not just that some threads will occasionally write to the counter at the same time, but that lots of threads will keep doing so, then you want to have several counters, at least one cache-line apart from each other, and have different threads write to different counters, summing them when you need the tally.
This keeps most threads out of each others ways, which stops them from flushing each others values out of the cores, and slowing each other up. (You still need interlocked unless you can guarantee each thread will stay separate).
For the vast majority of cases, you just need to make sure the occasional bit of contention doesn't mess up the values, in which case Sean U's answer is better in every way (striped counters like this are slower for uncontested use).

What exactly are you trying to do with the counters? Locks don't really do much with integer properties, since reads and writes of integers are atomic with or without locking. The only benefit one can get from locks is the addition of memory barriers; one can achieve the same effect by using Threading.Thread.MemoryBarrier() before and after you read or write a shared variable.
I suspect your real problem is that you are trying to do something like "DoneCounter+=1", which--even with locking--would perform the following sequence of events:
Acquire lock
Get _DoneCounter
Release lock
Add one to value that was read
Acquire lock
Set _DoneCounter to computed value
Release lock
Not very helpful, since the value might change between the get and set. What would be needed would be a method that would perform the get, computation, and set without any intervening operations. There are three ways this can be accomplished:
Acquire and keep a lock during the whole operation
Use Threading.Interlocked.Increment to add a value to _Counter
Use a Threading.Interlocked.CompareExchange loop to update _Counter
Using any of these approaches, it's possible to compute a new value of _Counter based on the old value, in such a fashion that the value written is guaranteed to be based upon the value _Counter had at the time of the write.

You could declare the _DoneCounter variable as "volatile", to make it thread-safe. See this:
http://msdn.microsoft.com/en-us/library/x13ttww7%28v=vs.71%29.aspx

Simple threading question, locking non local changes

Ok first I must preface this question with a disclaimer, I'm really new to threading so this may be a 'newbie' question but I searched google and couldn't find an answer. As I understand it a critical section is code that can be accessed by two or more threads, the danger being one thread will overwrite a value before the other is finished and vice versa. What can you do about changes made outside of your class for example, I have a line monitoring program:
int currentNumber = provider.GetCurrentNumber();
if(provider.CanPassNumber(false, currentNumber))
{
currentNumber++;
provider.SetNumber(currentNumber);
}
and on another thread I have something like this:
if(condition)
provider.SetNumber(numberToSet);
Now I'm afraid that in the first function I get currentNumber which is 5, right after that on another thread the number is set to 7 and then it rewrites the 7 to 6, ignoring the change made by the thread that set it to 7.
Is there anyway to lock provider.SetNumber until the first function finishes? The critical section is basically the currentNumber which can be changed by many places in the program.
I hope I made myself clear, if not let me know and I will try to explain myself better.
EDIT:
Also I made the functions really short for the example. In reality the function is much longer and makes changes to currentNumber many times so I don't really want to put a lock around the entire function. If I lock every call to provider.SetNumber and release it after I finish it can change during the time it is released before I lock it again to call provider.SetNumber. Honestly I'm also worried about locking the entire function because of performance and deadlock.

Rather than using the lock() keywords I'd suggested seeing if you can use the Interlocked class which is designed for small operations. It's got much less overhead than lock, in fact can be down to a single CPU instruction on some CPUs.
There are a couple of methods of interest for you, Exchange and Read, both of which are thread safe.

You want to look into the Lock keyword. Also you might want to this tutorial to Threading in C#.

As Filip said, lock is useful here.
Not only should you lock on provider.SetNumber(currentNumber), you also need to lock on any conditional that the setter depends on.
lock(someObject)
{
if(provider.CanPassNumber(false, currentNumber))
{
currentNumber++;
provider.SetNumber(currentNumber);
}
}
as well as
if(condition)
{
lock(someObject)
{
provider.SetNumber(numberToSet);
}
}
If condition is reliant on numberToSet, you should take the lock statement around the whole block. Also note that someObject must be the same object.

You can use the lock statement, to enter a critical section with mutual exclusion. The lock will use the object's reference to differentiate one critical section from another, you must have the same reference for all your lock if it accesses to the same elements.
// Define an object which can be locked in your class.
object locker = new object();
// Add around your critical sections the following :
lock (locker) { /* ... */ }
That will change your code to :
int currentNumber = provider.GetCurrentNumber();
lock (locker)
{
if(provider.CanPassNumber(false, currentNumber))
{
currentNumber++;
provider.SetNumber(currentNumber);
}
}
And :
if(condition)
{
lock (locker)
{
provider.SetNumber(numberToSet);
}
}

In your SetNumber method you can simply use a lock statement:
public class MyProvider {
object numberLock = new object();
...
public void SetNumber(int num) {
lock(numberLock) {
// Do Stuff
}
}
}
Also, note that in your example currentNumber is a primitive (int), which means that variable's value won't be overwritten should your provider's actual data member's value change.

Well first of im not so good with threading but a critical section is a part of your code that can only be accessed my one thread at a time not the other way around..
To create a critical section is easy
Lock(this)
{
//Only one thread can run this at a time
}
note: that this should be replaced with some internal object...

How can I have many threads that need to know the next ID to process and then increment that number safely?

I'm working a program that will have a bunch of threads processing data.
Each thread needs to grab the next available ID, increment that ID by 1 for the next thread and do this in a thread-safe way.
Is this an instance where I would use a mutex? Should I use a Queue.Synchronized instead and fill it up with all 300,000 ID's or is this unecessary?
Should I just have a single integer and somehow lock the retrieval and updating of that number so thread1 comes in, gets "20" as the next ID and then increments it to "21" while another thread is waiting?
What is the best-practice for this use-case?

You can do this without locking via Interlocked.Increment.
Just do this like so:
private static int value;
public static int Value
{
get { return Interlocked.Increment(ref value); }
}
This will always return an incrementing integer, without locks, and with perfect threadsafety.

This is probably the perfect candidate for Interlocked.Increment.
int id = 0;
int nextId = Interlocked.Increment(ref id); // returns the incremented value
The Increment is performed as an atomic operation and is thread-safe.

Nope.
Best way to do this is Interlocked.Increment().
Basically you can perform the increment in a threadsafe way without locking based on guarantees provided by the CPU architecture.

You should look into the Interlocked class. It provides functionality for atomically incrementing the value of an integer.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.