This question already has answers here:
Different summation results with Parallel.ForEach
(4 answers)
Closed 4 years ago.
I've written 3 different ways of computing a sum of an array of integer, however, I'm getting a different result for the third method.
Initialization:
int n = 100;
int[] mArray = new int[n];
for (int i = 0; i < mArray.Length; i++)
mArray[i] = 1;
First:
int sum1 = mArray.Sum();
Console.WriteLine("sum1 " + sum1);
Second:
int sum2 = 0;
for (int i = 0; i < mArray.Length; i++)
sum2 += mArray[i];
Console.WriteLine("sum2 " + sum2);
Third:
int sum3 = 0;
Parallel.ForEach(mArray, item =>
{
sum3 += item;
});
Console.WriteLine("sum3 " + sum3);
Obviously, the 3 approaches gave the same output as seen below :
However, when n is increased (e.g., n = 30000), the third approach gives surprisingly false results
NB: I've tested the approaches using ConcurrentBag which is a thread-safe collection. I suppose, there is no overflow issue. The code is tested on a windows 10 x64 computer (Intel core I-7 # 3.30ghz)
It would be interesting to understand why Parallel.For behaves differently.
The problem is that sum3 is accessed by multiple threads when you use Parallel.ForEach. sum3 += item; would normally involve three operations:
1. Read the value of sum3 into a temporary storage.
2. Increment the value of that storage with item;
3. Store the result back to sum3.
When multiple threads to this concurrently, it is very likely that operations will get mixed. For example, if you have two threads, A and B, both may read the same value from sum3, then do their adding and store the new values back.
To overcome this issue, you need to protect you access to sum3. The code should look like this:
object objLock = new object();
int sum3 = 0;
Parallel.ForEach(mArray, item =>
{
lock (objLock) { sum3 += item; }
});
Console.WriteLine("sum3 " + sum3);
But, that will totally negate the effect of the parallel execution.
I'have Nick's solution and it fixed the problem, however, there was a performance issue when using
lock (objLock) { sum3 += item; } directly in the Parallel.ForEach, as shown in the figure below
Fortunatly, using Parallel Aggregation Operations, as properly defined in .Net solved the issue. Here is the code
object locker = new object();
double sum4= 0;
Parallel.ForEach(mArray,
() => 0.0, // Initialize the local value.
(i, state, localResult) => localResult + i, localTotal => // Body delegate which returns the new local total. // Add the local value
{
lock (locker) sum4+= localTotal;
} // to the master value.
);
Related
I am trying to read multidimensional arrays and get the index and values of the element if it is fitting the condition with multithreading.
I divided the multidimensional array into smaller subcubes under name jobs
If the condition fits, I save the value to the samples array, and index to the ValidCubeIndexList array. So the "samples" and the "ValidCubeIndexList" is going to be shared and used between threads.
I am not sure if this is a correct approach or not, but with parallelism, I couldn't find a way to lock iSample locally.
Parallel.For(0, jobs.Count, new ParallelOptions {
MaxDegreeOfParallelism = NumOfProcessor
},
delegate(int i, ParallelLoopState state) {
var job = jobs[i];
Index3 min = job.output.MinIJK;
Index3 max = job.output.MaxIJK;
var bulk = job.output.ToArray();
int x = bulk.GetLength(0);
int y = bulk.GetLength(1);
int z = bulk.GetLength(2);
for (int n = 0; n < x; n++) {
for (int m = 0; m < y; m++) {
for (int b = 0; b < z; b++) {
int activeIndex = Get3DIndex(min.I + n, min.J + m, min.K + b, cubeIndex.I,
cubeIndex.J, cubeIndex.K);
if (SelectionMaskIsActive) {
if (invFlags[activeIndex]) {
samples[iSample] = bulk[n, m, b];
ValidCubeIndexList[iSample] = activeIndex;
Interlocked.Increment(ref iSample);
}
} else {
samples[iSample] = bulk[n, m, b];
ValidCubeIndexList[iSample] = activeIndex;
Interlocked.Increment(ref iSample);
}
}
}
}
});
However the Interlocked.Increment(ref iSample) is not working as I expected.
How can I share and use the iSample parameter between threads?
As far as I can tell the usage and value of iSample is shared over multiple threads.
Incrementing it using Interlocked.Increment(ref iSample); will not cause the value to be locked locally in any way, so, if you increment, you're changing the value over all the threads which often leads to unexpected results.
If your timing is as such that you rely on this behavior I recommend to use an alternative approach or mark the variable as volatile, but even then its considered bad practice, because, whilst:
The volatile keyword indicates that a field might be modified by multiple threads that are executing at the same time.
it does not guarantee direct availability of the variable over all threads, by:
On a multiprocessor system, a volatile read operation does not guarantee to obtain the latest value written to that memory location by any processor. Similarly, a volatile write operation does not guarantee that the value written would be immediately visible to other processors.
This is my first attempt at parallel programming.
I'm writing a test console app before using this in my real app and I can't seem to get it right. When I run this, the parallel search is always faster than the sequential one, but the parallel search never finds the correct value. What am I doing wrong?
I tried it without using a partitioner (just Parallel.For); it was slower than the sequential loop and gave the wrong number. I saw a Microsoft doc that said for simple computations, using Partitioner.Create can speed things up. So I tried that but still got the wrong values. Then I saw Interlocked, but I think I'm using it wrong.
Any help would be greatly appreciated
Random r = new Random();
Stopwatch timer = new Stopwatch();
do {
// Make and populate a list
List<short> test = new List<short>();
for (int x = 0; x <= 10000000; x++)
{
test.Add((short)(r.Next(short.MaxValue) * r.NextDouble()));
}
// Initialize result variables
short rMin = short.MaxValue;
short rMax = 0;
// Do min/max normal search
timer.Start();
foreach (var amp in test)
{
rMin = Math.Min(rMin, amp);
rMax = Math.Max(rMax, amp);
}
timer.Stop();
// Display results
Console.WriteLine($"rMin: {rMin} rMax: {rMax} Time: {timer.ElapsedMilliseconds}");
// Initialize parallel result variables
short pMin = short.MaxValue;
short pMax = 0;
// Create list partioner
var rangePortioner = Partitioner.Create(0, test.Count);
// Do min/max parallel search
timer.Restart();
Parallel.ForEach(rangePortioner, (range, loop) =>
{
short min = short.MaxValue;
short max = 0;
for (int i = range.Item1; i < range.Item2; i++)
{
min = Math.Min(min, test[i]);
max = Math.Max(max, test[i]);
}
_ = Interlocked.Exchange(ref Unsafe.As<short, int>(ref pMin), Math.Min(pMin, min));
_ = Interlocked.Exchange(ref Unsafe.As<short, int>(ref pMax), Math.Max(pMax, max));
});
timer.Stop();
// Display results
Console.WriteLine($"pMin: {pMin} pMax: {pMax} Time: {timer.ElapsedMilliseconds}");
Console.WriteLine("Press enter to run again; any other key to quit");
} while (Console.ReadKey().Key == ConsoleKey.Enter);
Sample output:
rMin: 0 rMax: 32746 Time: 106
pMin: 0 pMax: 32679 Time: 66
Press enter to run again; any other key to quit
The correct way to do a parallel search like this is to compute local values for each thread used, and then merge the values at the end. This ensures that synchronization is only needed at the final phase:
var items = Enumerable.Range(0, 10000).ToList();
int globalMin = int.MaxValue;
int globalMax = int.MinValue;
Parallel.ForEach<int, (int Min, int Max)>(
items,
() => (int.MaxValue, int.MinValue), // Create new min/max values for each thread used
(item, state, localMinMax) =>
{
var localMin = Math.Min(item, localMinMax.Min);
var localMax = Math.Max(item, localMinMax.Max);
return (localMin, localMax); // return the new min/max values for this thread
},
localMinMax => // called one last time for each thread used
{
lock(items) // Since this may run concurrently, synchronization is needed
{
globalMin = Math.Min(globalMin, localMinMax.Min);
globalMax = Math.Max(globalMax, localMinMax.Max);
}
});
As you can see this is quite a bit more complex than a regular loop, and this is not even doing anything fancy like partitioning. An optimized solution would work over larger blocks to reduce overhead, but this is omitted for simplicity, and it looks like the OP is aware such issues already.
Be aware that multi threaded programming is difficult. While it is a great idea to try out such techniques in a playground rather than a real program, I would still suggest that you should start by studying the potential dangers of thread safety, there is fairly easy to find good resources about this.
Not all problems will be as obviously wrong like this, and it is quite easy to cause issues that breaks once in a million, or only when the cpu load is high, or only on single CPU systems, or issues that are only detected long after the code is put into production. It is a good practice to be paranoid whenever multiple threads may read and write the same memory concurrently.
I would also recommend learning about immutable data types, and pure functions, since these are much safer and easier to reason about once multiple threads are involved.
Interlocked.Exchange is thread safe only for Exchange, every Math.Min and Math.Max can be with race condition. You should compute min/max for every batch separately and then join results.
Using low-lock techniques like the Interlocked class is tricky and advanced. Taking into consideration that your experience in multithreading is not excessive, I would say go with a simple and trusty lock:
object locker = new object();
//...
lock (locker)
{
pMin = Math.Min(pMin, min);
pMax = Math.Max(pMax, max);
}
I have 4 threads that each have their own loop and they access a List that contains a delta time for each thread. Since there are 4 threads in this example, there will be 4 items in the List. Can these threads change the value of their assigned index (example: thread 0 - index 0 of List) without locks since I know no other thread will write to that index, or if I just need to use locks for this.
I have already implemented this and it does not seem to effect the other values or corrupt the list, but I want to make sure.
int threadCount = 4;
bool run = true;
List<double> lastDeltaTime = new List<double>();
private List<Thread> threadList = new List<Thread>();
void InitializeThreads()
{
for (int i = 0; i < threadCount; i++)
{
int tempName = i;
Thread tempThread = new Thread(() => ThreadLoop(tempName));
threadList.Add(tempThread);
lastDeltaTime.Add(0);
}
for (int i = 0; i < threadCount; i++)
{
threadList[i].Start();
}
}
void ThreadLoop(int threadName)
{
double lastTime = DateTime.UtcNow.Ticks;
while(run)
{
double currentTime = DateTime.UtcNow.Ticks;
double deltaTime = ((currentTime - lastTime) / 10000000) * timescale;
lastDeltaTime[threadName] = deltaTime; //line setting deltaTime
//do work
//end work
lastTime = currentTime;
}
}
Information Learned:
It is okay to write to an index location of a list with a Thread without a lock ONLY if you can ensure no other Thread will also try to write to the same index, and that the list will remain the same size.
Use an array if you can ensure that no other Thread will try to write to the same index location
Use a Class that contains the Thread and the variable, then make a list of the Class (Not Tested by me as of yet)
Otherwise ... use a lock
That's not safe to do at all. You're adding items to the list while accessing other items, which is a problem if that addition results in resizing the backing array. You can end up accessing items while the list is in the middle of a resize, which can break in any number of different ways.
This question already has answers here:
Parallel.For(): Update variable outside of loop
(7 answers)
Closed 6 years ago.
this is allmost my first attempt at parallel code, (first attempt worked fine and speeded up some code) but this below is causing strange issues and I cant see why. Both for loops below give the same result most of the time but not allways, i.e. res != res1. The function IdealGasEnthalpy is just calculating a number and not changing anything else, i cant figure out what the problem is or even where to begin to look, has anyone any suggestions?
double res = 0;
object lockObject = new object();
for (int I = 0; I < cc.Count; I++)
{
res += IdealGasEnthalpy(T, cc[I], enumMassOrMolar.Molar) * x[I];
}
double res1 = 0;
Parallel.For(0, cc.Count, I =>
{
res1 += IdealGasEnthalpy(T, cc[I], enumMassOrMolar.Molar) * x[I];
});
I tried the following code, but its very slow and doubled the execution time for the whole program compared to serial code.
double res = 0.0d;
Parallel.For(0, cc.Count,
() => 0.0d,
(x, loopState, partialResult) =>
{
return partialResult += IdealGasEnthalpy(T, cc[x], enumMassOrMolar.Molar) * X[x];
},
(localPartialSum) =>
{
lock (lockObject)
{
res += localPartialSum;
}
});
Also tried this below, going to stick to non-parallel for this routine as the parallel versions are all a lot slower...
double res = 0.0d;
double[] partialresult = new double[cc.Count];
Parallel.For(0, cc.Count, i =>
{
partialresult[i] = IdealGasEnthalpy(T, cc[i], enumMassOrMolar.Molar) * X[i];
});
for (int i = 0; i < cc.Count; i++)
{
res += partialresult[i];
}*
Your second operation needs to do an interlocked add, because += is not atomic. Remember this is shorthand for read the variable, add to it, and store the result. There is a race condition where two reads of the same old value could occur before either has stored the new result. You need to synchronize access.
Note that, depending on how computationally expensive your function is, interlocking with the Parallel.For approach might be slower than just doing a serial approach. It comes down to how much time is spent calculating the value versus how much time is spent synchronizing and doing the summation.
Alternately you could store the results in an array which you allocate in advance, then do the summation after all parallel operations are done. That way no two operations modify the same variable. The array trades memory for speed, since you eliminate overhead from synchronization.
Can anyone explain, why this program is returning the correct value for sqrt_min?
int n = 1000000;
double[] myArr = new double[n];
for(int i = n-1 ; i>= 0; i--){ myArr[i] = (double)i;}
// sqrt_min contains minimal sqrt-value
double sqrt_min = double.MaxValue;
Parallel.ForEach(myArr, num =>
{
double sqrt = Math.Sqrt(num); // some time consuming calculation that should be parallized
if(sqrt < sqrt_min){ sqrt_min = sqrt;}
});
Console.WriteLine("minimum: "+sqrt_min);
It works by sheer luck. Sometimes when you run it you are lucky that the non-atomic reads and writes to the double are not resulting in "torn" values. Sometimes you are lucky that the non-atomic tests and sets just happen to be setting the correct value when that race happens. There is no guarantee that this program produces any particular result.
Your code is not safe; it only works by coincidence.
If two threads run the if simultaneously, one of the minimums will be overwritten:
sqrt_min = 6
Thread A: sqrt = 5
Thread B: sqrt = 4
Thread A enters the if
Thread B enters the if
Thread B assigns sqrt_min = 4
Thread A assigns sqrt_min = 5
On 32-bit systems, you're also vulnerable to read/write tearing.
It would be possible to make this safe using Interlocked.CompareExchange in a loop.
For why your original code is broken check the other answers, I won't repeat that.
Multithreading is easiest when there is no write access to shared state. Luckily your code can be written that way. Parallel linq can be nice in such situations, but sometimes the the overhead is too large.
You can rewrite your code to:
double sqrt_min = myArr.AsParallel().Select(x=>Math.Sqrt(x)).Min();
In your specific problem it's faster to swap around the Min and the Sqrt operation, which is possible because Sqrt is monotonically increasing.
double sqrt_min = Math.Sqrt(myArr.AsParallel().Min())
Your code does not really work: I ran it in a loop 100,000 times, and it failed once on my 8-core computer, producing this output:
minimum: 1
I shortened the runs to make the error appear faster.
Here are my modifications:
static void Run() {
int n = 10;
double[] myArr = new double[n];
for (int i = n - 1; i >= 0; i--) { myArr[i] = (double)i*i; }
// sqrt_min contains minimal sqrt-value
double sqrt_min = double.MaxValue;
Parallel.ForEach(myArr, num => {
double sqrt = Math.Sqrt(num); // some time consuming calculation that should be parallized
if (sqrt < sqrt_min) { sqrt_min = sqrt; }
});
if (sqrt_min > 0) {
Console.WriteLine("minimum: " + sqrt_min);
}
}
static void Main() {
for (int i = 0; i != 100000; i++ ) {
Run();
}
}
This is not a coincidence, considering the lack of synchronization around the reading and writing of a shared variable.
As others have said, this only works based on shear luck. Both the OP and other posters have had trouble actually creating the race condition though. That is fairly easily explained. The code generates lots of race conditions, but the vast majority of them (99.9999% to be exact) are irrelevant. All that matters at the end of the day is the fact that 0 should be the min result. If your code thinks that root 5 is greater than root 6, or that root 234 is greater than root 235 it still won't break. There needs to be a race condition specifically with the iteration generating 0. The odds that one of the iterations has a race condition with another is very, very high. The odds that the iteration processing the last item has a race condition is really quite low.