parallel code causing strange results [duplicate] - c#

This question already has answers here:
Parallel.For(): Update variable outside of loop
(7 answers)
Closed 6 years ago.
this is allmost my first attempt at parallel code, (first attempt worked fine and speeded up some code) but this below is causing strange issues and I cant see why. Both for loops below give the same result most of the time but not allways, i.e. res != res1. The function IdealGasEnthalpy is just calculating a number and not changing anything else, i cant figure out what the problem is or even where to begin to look, has anyone any suggestions?
double res = 0;
object lockObject = new object();
for (int I = 0; I < cc.Count; I++)
{
res += IdealGasEnthalpy(T, cc[I], enumMassOrMolar.Molar) * x[I];
}
double res1 = 0;
Parallel.For(0, cc.Count, I =>
{
res1 += IdealGasEnthalpy(T, cc[I], enumMassOrMolar.Molar) * x[I];
});
I tried the following code, but its very slow and doubled the execution time for the whole program compared to serial code.
double res = 0.0d;
Parallel.For(0, cc.Count,
() => 0.0d,
(x, loopState, partialResult) =>
{
return partialResult += IdealGasEnthalpy(T, cc[x], enumMassOrMolar.Molar) * X[x];
},
(localPartialSum) =>
{
lock (lockObject)
{
res += localPartialSum;
}
});
Also tried this below, going to stick to non-parallel for this routine as the parallel versions are all a lot slower...
double res = 0.0d;
double[] partialresult = new double[cc.Count];
Parallel.For(0, cc.Count, i =>
{
partialresult[i] = IdealGasEnthalpy(T, cc[i], enumMassOrMolar.Molar) * X[i];
});
for (int i = 0; i < cc.Count; i++)
{
res += partialresult[i];
}*

Your second operation needs to do an interlocked add, because += is not atomic. Remember this is shorthand for read the variable, add to it, and store the result. There is a race condition where two reads of the same old value could occur before either has stored the new result. You need to synchronize access.
Note that, depending on how computationally expensive your function is, interlocking with the Parallel.For approach might be slower than just doing a serial approach. It comes down to how much time is spent calculating the value versus how much time is spent synchronizing and doing the summation.
Alternately you could store the results in an array which you allocate in advance, then do the summation after all parallel operations are done. That way no two operations modify the same variable. The array trades memory for speed, since you eliminate overhead from synchronization.

Related

Parallel.ForEach search doesn't find the correct value

This is my first attempt at parallel programming.
I'm writing a test console app before using this in my real app and I can't seem to get it right. When I run this, the parallel search is always faster than the sequential one, but the parallel search never finds the correct value. What am I doing wrong?
I tried it without using a partitioner (just Parallel.For); it was slower than the sequential loop and gave the wrong number. I saw a Microsoft doc that said for simple computations, using Partitioner.Create can speed things up. So I tried that but still got the wrong values. Then I saw Interlocked, but I think I'm using it wrong.
Any help would be greatly appreciated
Random r = new Random();
Stopwatch timer = new Stopwatch();
do {
// Make and populate a list
List<short> test = new List<short>();
for (int x = 0; x <= 10000000; x++)
{
test.Add((short)(r.Next(short.MaxValue) * r.NextDouble()));
}
// Initialize result variables
short rMin = short.MaxValue;
short rMax = 0;
// Do min/max normal search
timer.Start();
foreach (var amp in test)
{
rMin = Math.Min(rMin, amp);
rMax = Math.Max(rMax, amp);
}
timer.Stop();
// Display results
Console.WriteLine($"rMin: {rMin} rMax: {rMax} Time: {timer.ElapsedMilliseconds}");
// Initialize parallel result variables
short pMin = short.MaxValue;
short pMax = 0;
// Create list partioner
var rangePortioner = Partitioner.Create(0, test.Count);
// Do min/max parallel search
timer.Restart();
Parallel.ForEach(rangePortioner, (range, loop) =>
{
short min = short.MaxValue;
short max = 0;
for (int i = range.Item1; i < range.Item2; i++)
{
min = Math.Min(min, test[i]);
max = Math.Max(max, test[i]);
}
_ = Interlocked.Exchange(ref Unsafe.As<short, int>(ref pMin), Math.Min(pMin, min));
_ = Interlocked.Exchange(ref Unsafe.As<short, int>(ref pMax), Math.Max(pMax, max));
});
timer.Stop();
// Display results
Console.WriteLine($"pMin: {pMin} pMax: {pMax} Time: {timer.ElapsedMilliseconds}");
Console.WriteLine("Press enter to run again; any other key to quit");
} while (Console.ReadKey().Key == ConsoleKey.Enter);
Sample output:
rMin: 0 rMax: 32746 Time: 106
pMin: 0 pMax: 32679 Time: 66
Press enter to run again; any other key to quit
The correct way to do a parallel search like this is to compute local values for each thread used, and then merge the values at the end. This ensures that synchronization is only needed at the final phase:
var items = Enumerable.Range(0, 10000).ToList();
int globalMin = int.MaxValue;
int globalMax = int.MinValue;
Parallel.ForEach<int, (int Min, int Max)>(
items,
() => (int.MaxValue, int.MinValue), // Create new min/max values for each thread used
(item, state, localMinMax) =>
{
var localMin = Math.Min(item, localMinMax.Min);
var localMax = Math.Max(item, localMinMax.Max);
return (localMin, localMax); // return the new min/max values for this thread
},
localMinMax => // called one last time for each thread used
{
lock(items) // Since this may run concurrently, synchronization is needed
{
globalMin = Math.Min(globalMin, localMinMax.Min);
globalMax = Math.Max(globalMax, localMinMax.Max);
}
});
As you can see this is quite a bit more complex than a regular loop, and this is not even doing anything fancy like partitioning. An optimized solution would work over larger blocks to reduce overhead, but this is omitted for simplicity, and it looks like the OP is aware such issues already.
Be aware that multi threaded programming is difficult. While it is a great idea to try out such techniques in a playground rather than a real program, I would still suggest that you should start by studying the potential dangers of thread safety, there is fairly easy to find good resources about this.
Not all problems will be as obviously wrong like this, and it is quite easy to cause issues that breaks once in a million, or only when the cpu load is high, or only on single CPU systems, or issues that are only detected long after the code is put into production. It is a good practice to be paranoid whenever multiple threads may read and write the same memory concurrently.
I would also recommend learning about immutable data types, and pure functions, since these are much safer and easier to reason about once multiple threads are involved.
Interlocked.Exchange is thread safe only for Exchange, every Math.Min and Math.Max can be with race condition. You should compute min/max for every batch separately and then join results.
Using low-lock techniques like the Interlocked class is tricky and advanced. Taking into consideration that your experience in multithreading is not excessive, I would say go with a simple and trusty lock:
object locker = new object();
//...
lock (locker)
{
pMin = Math.Min(pMin, min);
pMax = Math.Max(pMax, max);
}

Parallel for loop with BigINtegers

So i am working on a program to factor a given number. you write a number in the command line and it gives you all the factors. now, doing this sequentially is VERY slow. because it uses only one thread, if i'm not mistaking. now, i thought about doing it with Parallel.For and it worked, only with integers, so wanted to try it with BigIntegers
heres the normal code:
public static void Factor(BigInteger f)
{
for(BigInteger i = 1; i <= f; i++)
{
if (f % i == 0)
{
Console.WriteLine("{0} is a factor of {1}", i, f);
}
}
}
the above code should be pretty easy to understand. but as i said, this Very slow for big numbers (above one billion it starts to get impractical) and heres my parallel code:
public static void ParallelFacotr(BigInteger d)
{
Parallel.For<BigInteger>(0, d, new ParallelOptions() { MaxDegreeOfParallelism = Environment.ProcessorCount }, i =>
{
try
{
if (d % i == 0)
{
Console.WriteLine("{0} is a factor of {1}", i, d);
}
}
catch (DivideByZeroException)
{
Console.WriteLine("Done!");
Console.ReadKey();
Launcher.Main();
}
});
}
now, the above code (parallel one) works just fine with integers (int) but its VERY fast. it factored 1000.000.000 in just 2 seconds. so i thought why not try it with bigger integers. and also, i thought that putting <Biginteger> after the parallel.for would do it. but it doesn't. so, how do you work with bigintegers in parallel for loop? and i already tried just a regular parallel loop with a bigInteger as the argument, but then it gives an error saying that it cannot convert from BigInteger to int. so how do you
Improve your algorithm efficiency first.
While it is possible to use BigInteger you will not have CPU ALU arithmetical logic to resolve arbitrarily big numbers logic in hardware, so it will be noticeably slower. So unless you need bigger numbers than 9 quintillion or exactly 9,223,372,036,854,775,807 then you can use type long.
A second thing to not is that you do not need to loop over all elements as it needs to be multiple of something, so you can reduce
for(BigInteger i = 1; i <= f; i++)
for(long i = 1; i <= Math.Sqrt(f); i++)
That would mean that instead of needing to iterate over 1000000000 items you iterate over 31623.
Additionally, if you still plan on using BigInt then check the parameters:
It should be something in the lines of
Parallel.For(
0,
(int)d,
() => BigInteger.Zero,
(x, state, subTotal) => subTotal + BigInteger.One,
Just for trivia. Some programming languages are more efficient in solving problems than others and in this case, there is a languages Wolfram (previously Mathematica) when solving problems is simpler, granted that you know what you are doing.
However they do have google alternative that answers you directly and it has a decent AI that processes your natural language to give you an exact answer as best as it could.
So finding factors of numbers is easy as:
Factor[9223372036854775809]
or use web api https://www.wolframalpha.com/input/?i=factor+9223372036854775809
You can also call Wolfram kernel from C#, but terms and conditions apply.

A parallelism issue [duplicate]

This question already has answers here:
Different summation results with Parallel.ForEach
(4 answers)
Closed 4 years ago.
I've written 3 different ways of computing a sum of an array of integer, however, I'm getting a different result for the third method.
Initialization:
int n = 100;
int[] mArray = new int[n];
for (int i = 0; i < mArray.Length; i++)
mArray[i] = 1;
First:
int sum1 = mArray.Sum();
Console.WriteLine("sum1 " + sum1);
Second:
int sum2 = 0;
for (int i = 0; i < mArray.Length; i++)
sum2 += mArray[i];
Console.WriteLine("sum2 " + sum2);
Third:
int sum3 = 0;
Parallel.ForEach(mArray, item =>
{
sum3 += item;
});
Console.WriteLine("sum3 " + sum3);
Obviously, the 3 approaches gave the same output as seen below :
However, when n is increased (e.g., n = 30000), the third approach gives surprisingly false results
NB: I've tested the approaches using ConcurrentBag which is a thread-safe collection. I suppose, there is no overflow issue. The code is tested on a windows 10 x64 computer (Intel core I-7 # 3.30ghz)
It would be interesting to understand why Parallel.For behaves differently.
The problem is that sum3 is accessed by multiple threads when you use Parallel.ForEach. sum3 += item; would normally involve three operations:
1. Read the value of sum3 into a temporary storage.
2. Increment the value of that storage with item;
3. Store the result back to sum3.
When multiple threads to this concurrently, it is very likely that operations will get mixed. For example, if you have two threads, A and B, both may read the same value from sum3, then do their adding and store the new values back.
To overcome this issue, you need to protect you access to sum3. The code should look like this:
object objLock = new object();
int sum3 = 0;
Parallel.ForEach(mArray, item =>
{
lock (objLock) { sum3 += item; }
});
Console.WriteLine("sum3 " + sum3);
But, that will totally negate the effect of the parallel execution.
I'have Nick's solution and it fixed the problem, however, there was a performance issue when using
lock (objLock) { sum3 += item; } directly in the Parallel.ForEach, as shown in the figure below
Fortunatly, using Parallel Aggregation Operations, as properly defined in .Net solved the issue. Here is the code
object locker = new object();
double sum4= 0;
Parallel.ForEach(mArray,
() => 0.0, // Initialize the local value.
(i, state, localResult) => localResult + i, localTotal => // Body delegate which returns the new local total. // Add the local value
{
lock (locker) sum4+= localTotal;
} // to the master value.
);

What do this methods do?

I've found two diferent methods to get a Max value from an array but I'm not really fond of parallel programing, so I really don't understand it.
I was wondering do this methods do the same or am I missing something?
I really don't have much information about them. Not even comments...
The first method:
int[] vec = ... (I guess the content doesn't matter)
static int naiveMax()
{
int max = vec[0];
object obj = new object();
Parallel.For(0, vec.Length, i =>
{
lock (obj) {
if (vec[i] > max) max = vec[i];
}
});
return max;
}
And the second one:
static int Max()
{
int max = vec[0];
object obj = new object();
Parallel.For(0, vec.Length, //could be Parallel.For<int>
() => vec[0],
(i, loopState, partial) =>
{
if(vec[i]>partial) partial = vec[i];
return partial;
},
partial => {
lock (obj) {
if( partial > max) max = partial;
}
});
return max;
}
Do these do the same or something diferent and what? Thanks ;)
Both find the maximum value in an array of integers. In an attempt to find the maximum value faster, they do it "in parallel" using the Parallel.For Method. Both methods fail at this, though.
To see this, we first need a sufficiently large array of integers. For small arrays, parallel processing doesn't give us a speed-up anyway.
int[] values = new int[100000000];
Random random = new Random();
for (int i = 0; i < values.Length; i++)
{
values[i] = random.Next();
}
Now we can run the two methods and see how long they take. Using an appropriate performance measurement setup (Stopwatch, array of 100,000,000 integers, 100 iterations, Release build, no debugger attached, JIT warm-up) I get the following results on my machine:
naiveMax 00:06:03.3737078
Max 00:00:15.2453303
So Max is much much better than naiveMax (6 minutes! cough).
But how does it compare to, say, PLINQ?
static int MaxPlinq(int[] values)
{
return values.AsParallel().Max();
}
MaxPlinq 00:00:11.2335842
Not bad, saved a few seconds. Now, what about a plain, old, sequential for loop for comparison?
static int Simple(int[] values)
{
int result = values[0];
for (int i = 0; i < values.Length; i++)
{
if (result < values[i]) result = values[i];
}
return result;
}
Simple 00:00:05.7837002
I think we have a winner.
Lesson learned: Parallel.For is not pixie dust that you can sprinkle over your code to
make it magically run faster. If performance matters, use the right tools and measure, measure, measure, ...
They appear to do the same thing, however they are very inefficient. The point of parallelization is to improve the speed of code that can be executed independently. Due to race conditions, discovering the maximum (as implemented here) requires an atomic semaphore/lock on the actual logic... Which means you're spinning up many threads and related resources simply to do the code sequentially anyway... Defeating the purpose of parallelization entirely.

parallel.foreach works, but why?

Can anyone explain, why this program is returning the correct value for sqrt_min?
int n = 1000000;
double[] myArr = new double[n];
for(int i = n-1 ; i>= 0; i--){ myArr[i] = (double)i;}
// sqrt_min contains minimal sqrt-value
double sqrt_min = double.MaxValue;
Parallel.ForEach(myArr, num =>
{
double sqrt = Math.Sqrt(num); // some time consuming calculation that should be parallized
if(sqrt < sqrt_min){ sqrt_min = sqrt;}
});
Console.WriteLine("minimum: "+sqrt_min);
It works by sheer luck. Sometimes when you run it you are lucky that the non-atomic reads and writes to the double are not resulting in "torn" values. Sometimes you are lucky that the non-atomic tests and sets just happen to be setting the correct value when that race happens. There is no guarantee that this program produces any particular result.
Your code is not safe; it only works by coincidence.
If two threads run the if simultaneously, one of the minimums will be overwritten:
sqrt_min = 6
Thread A: sqrt = 5
Thread B: sqrt = 4
Thread A enters the if
Thread B enters the if
Thread B assigns sqrt_min = 4
Thread A assigns sqrt_min = 5
On 32-bit systems, you're also vulnerable to read/write tearing.
It would be possible to make this safe using Interlocked.CompareExchange in a loop.
For why your original code is broken check the other answers, I won't repeat that.
Multithreading is easiest when there is no write access to shared state. Luckily your code can be written that way. Parallel linq can be nice in such situations, but sometimes the the overhead is too large.
You can rewrite your code to:
double sqrt_min = myArr.AsParallel().Select(x=>Math.Sqrt(x)).Min();
In your specific problem it's faster to swap around the Min and the Sqrt operation, which is possible because Sqrt is monotonically increasing.
double sqrt_min = Math.Sqrt(myArr.AsParallel().Min())
Your code does not really work: I ran it in a loop 100,000 times, and it failed once on my 8-core computer, producing this output:
minimum: 1
I shortened the runs to make the error appear faster.
Here are my modifications:
static void Run() {
int n = 10;
double[] myArr = new double[n];
for (int i = n - 1; i >= 0; i--) { myArr[i] = (double)i*i; }
// sqrt_min contains minimal sqrt-value
double sqrt_min = double.MaxValue;
Parallel.ForEach(myArr, num => {
double sqrt = Math.Sqrt(num); // some time consuming calculation that should be parallized
if (sqrt < sqrt_min) { sqrt_min = sqrt; }
});
if (sqrt_min > 0) {
Console.WriteLine("minimum: " + sqrt_min);
}
}
static void Main() {
for (int i = 0; i != 100000; i++ ) {
Run();
}
}
This is not a coincidence, considering the lack of synchronization around the reading and writing of a shared variable.
As others have said, this only works based on shear luck. Both the OP and other posters have had trouble actually creating the race condition though. That is fairly easily explained. The code generates lots of race conditions, but the vast majority of them (99.9999% to be exact) are irrelevant. All that matters at the end of the day is the fact that 0 should be the min result. If your code thinks that root 5 is greater than root 6, or that root 234 is greater than root 235 it still won't break. There needs to be a race condition specifically with the iteration generating 0. The odds that one of the iterations has a race condition with another is very, very high. The odds that the iteration processing the last item has a race condition is really quite low.

Categories

Resources