I am trying to read multidimensional arrays and get the index and values of the element if it is fitting the condition with multithreading.
I divided the multidimensional array into smaller subcubes under name jobs
If the condition fits, I save the value to the samples array, and index to the ValidCubeIndexList array. So the "samples" and the "ValidCubeIndexList" is going to be shared and used between threads.
I am not sure if this is a correct approach or not, but with parallelism, I couldn't find a way to lock iSample locally.
Parallel.For(0, jobs.Count, new ParallelOptions {
MaxDegreeOfParallelism = NumOfProcessor
},
delegate(int i, ParallelLoopState state) {
var job = jobs[i];
Index3 min = job.output.MinIJK;
Index3 max = job.output.MaxIJK;
var bulk = job.output.ToArray();
int x = bulk.GetLength(0);
int y = bulk.GetLength(1);
int z = bulk.GetLength(2);
for (int n = 0; n < x; n++) {
for (int m = 0; m < y; m++) {
for (int b = 0; b < z; b++) {
int activeIndex = Get3DIndex(min.I + n, min.J + m, min.K + b, cubeIndex.I,
cubeIndex.J, cubeIndex.K);
if (SelectionMaskIsActive) {
if (invFlags[activeIndex]) {
samples[iSample] = bulk[n, m, b];
ValidCubeIndexList[iSample] = activeIndex;
Interlocked.Increment(ref iSample);
}
} else {
samples[iSample] = bulk[n, m, b];
ValidCubeIndexList[iSample] = activeIndex;
Interlocked.Increment(ref iSample);
}
}
}
}
});
However the Interlocked.Increment(ref iSample) is not working as I expected.
How can I share and use the iSample parameter between threads?
As far as I can tell the usage and value of iSample is shared over multiple threads.
Incrementing it using Interlocked.Increment(ref iSample); will not cause the value to be locked locally in any way, so, if you increment, you're changing the value over all the threads which often leads to unexpected results.
If your timing is as such that you rely on this behavior I recommend to use an alternative approach or mark the variable as volatile, but even then its considered bad practice, because, whilst:
The volatile keyword indicates that a field might be modified by multiple threads that are executing at the same time.
it does not guarantee direct availability of the variable over all threads, by:
On a multiprocessor system, a volatile read operation does not guarantee to obtain the latest value written to that memory location by any processor. Similarly, a volatile write operation does not guarantee that the value written would be immediately visible to other processors.
Related
I made some tests of code performance, and I would like to know how the CPU cache works in this kind of situation:
Here is a classic example for a loop:
private static readonly short[] _values;
static MyClass()
{
var random = new Random();
_values = Enumerable.Range(0, 100)
.Select(x => (short)random.Next(5000))
.ToArray();
}
public static void Run()
{
short max = 0;
for (var index = 0; index < _values.Length; index++)
{
max = Math.Max(max, _values[index]);
}
}
Here is the specific situation to get the same thing, but much more performant:
private static readonly short[] _values;
static MyClass()
{
var random = new Random();
_values = Enumerable.Range(0, 100)
.Select(x => (short)random.Next(5000))
.ToArray();
}
public static void Run()
{
short max1 = 0;
short max2 = 0;
for (var index = 0; index < _values.Length; index+=2)
{
max1 = Math.Max(max1, _values[index]);
max2 = Math.Max(max2, _values[index + 1]);
}
short max = Math.Max(max1, max2);
}
So I am interested to know why the second is more efficient as the first one.
I understand it's a story of CPU cache, but I don't get really how it happens (like values are not read twice between loops).
EDIT:
.NET Core 4.6.27617.04
2.1.11
Intel Core i7-7850HQ 2.90GHz 64-bit
Calling 50 Million of times:
MyClass1:
=> 00:00:06.0702028
MyClass2:
=> 00:00:03.8563776 (-36 %)
The last metric are the one with the Loop unrolling.
The difference in performance in this case is not related to caching - you have just 100 values - they fit entirely in the L2 cache already at the time you generated them.
The difference is due to out-of-order execution.
A modern CPU has multiple execution units and can perform more than one operation at the same time even in a single-threaded application.
But your loop is problematic for a modern CPU because it has a dependency:
short max = 0;
for (var index = 0; index < _values.Length; index++)
{
max = Math.Max(max, _values[index]);
}
Here each subsequent iteration is dependent on the value max from the previous one, so the CPU is forced to compute them sequentially.
Your revised loop adds a degree of freedom for the CPU; since max1 and max2 are independent, they can be computed in parallel.
So essentially the revised loop can run equally fast per iteration as the first one:
short max1 = 0;
short max2 = 0;
for (var index = 0; index < _values.Length; index+=2)
{
max1 = Math.Max(max1, _values[index]);
max2 = Math.Max(max2, _values[index + 1]);
}
But it has half the iterations, so in the end you get a significant speedup (not 2x because out-of-order execution is not perfect).
Caching
Caching in the cpu works such as it pre-loads the next few lines of code from memory and stores it in the CPU Cache, This may be data, pointers, variable values, etc. etc.
Code Blocks
between your two blocks of code, the difference may not appear in the syntax, try converting your Code to IL (intermediate runtime language for c# which is executed by JIT(just-in-time compiler)) see ref for tools and resources.
or just decompiler your built/compiled code and check how the compiler "optimized it" when making the dll/exe files using the decompiler below.
other performance optimization
Loop Unrolling
CPU Caching
Refs:
C# Decompiler
JIT
I have 4 threads that each have their own loop and they access a List that contains a delta time for each thread. Since there are 4 threads in this example, there will be 4 items in the List. Can these threads change the value of their assigned index (example: thread 0 - index 0 of List) without locks since I know no other thread will write to that index, or if I just need to use locks for this.
I have already implemented this and it does not seem to effect the other values or corrupt the list, but I want to make sure.
int threadCount = 4;
bool run = true;
List<double> lastDeltaTime = new List<double>();
private List<Thread> threadList = new List<Thread>();
void InitializeThreads()
{
for (int i = 0; i < threadCount; i++)
{
int tempName = i;
Thread tempThread = new Thread(() => ThreadLoop(tempName));
threadList.Add(tempThread);
lastDeltaTime.Add(0);
}
for (int i = 0; i < threadCount; i++)
{
threadList[i].Start();
}
}
void ThreadLoop(int threadName)
{
double lastTime = DateTime.UtcNow.Ticks;
while(run)
{
double currentTime = DateTime.UtcNow.Ticks;
double deltaTime = ((currentTime - lastTime) / 10000000) * timescale;
lastDeltaTime[threadName] = deltaTime; //line setting deltaTime
//do work
//end work
lastTime = currentTime;
}
}
Information Learned:
It is okay to write to an index location of a list with a Thread without a lock ONLY if you can ensure no other Thread will also try to write to the same index, and that the list will remain the same size.
Use an array if you can ensure that no other Thread will try to write to the same index location
Use a Class that contains the Thread and the variable, then make a list of the Class (Not Tested by me as of yet)
Otherwise ... use a lock
That's not safe to do at all. You're adding items to the list while accessing other items, which is a problem if that addition results in resizing the backing array. You can end up accessing items while the list is in the middle of a resize, which can break in any number of different ways.
This question already has answers here:
Different summation results with Parallel.ForEach
(4 answers)
Closed 4 years ago.
I've written 3 different ways of computing a sum of an array of integer, however, I'm getting a different result for the third method.
Initialization:
int n = 100;
int[] mArray = new int[n];
for (int i = 0; i < mArray.Length; i++)
mArray[i] = 1;
First:
int sum1 = mArray.Sum();
Console.WriteLine("sum1 " + sum1);
Second:
int sum2 = 0;
for (int i = 0; i < mArray.Length; i++)
sum2 += mArray[i];
Console.WriteLine("sum2 " + sum2);
Third:
int sum3 = 0;
Parallel.ForEach(mArray, item =>
{
sum3 += item;
});
Console.WriteLine("sum3 " + sum3);
Obviously, the 3 approaches gave the same output as seen below :
However, when n is increased (e.g., n = 30000), the third approach gives surprisingly false results
NB: I've tested the approaches using ConcurrentBag which is a thread-safe collection. I suppose, there is no overflow issue. The code is tested on a windows 10 x64 computer (Intel core I-7 # 3.30ghz)
It would be interesting to understand why Parallel.For behaves differently.
The problem is that sum3 is accessed by multiple threads when you use Parallel.ForEach. sum3 += item; would normally involve three operations:
1. Read the value of sum3 into a temporary storage.
2. Increment the value of that storage with item;
3. Store the result back to sum3.
When multiple threads to this concurrently, it is very likely that operations will get mixed. For example, if you have two threads, A and B, both may read the same value from sum3, then do their adding and store the new values back.
To overcome this issue, you need to protect you access to sum3. The code should look like this:
object objLock = new object();
int sum3 = 0;
Parallel.ForEach(mArray, item =>
{
lock (objLock) { sum3 += item; }
});
Console.WriteLine("sum3 " + sum3);
But, that will totally negate the effect of the parallel execution.
I'have Nick's solution and it fixed the problem, however, there was a performance issue when using
lock (objLock) { sum3 += item; } directly in the Parallel.ForEach, as shown in the figure below
Fortunatly, using Parallel Aggregation Operations, as properly defined in .Net solved the issue. Here is the code
object locker = new object();
double sum4= 0;
Parallel.ForEach(mArray,
() => 0.0, // Initialize the local value.
(i, state, localResult) => localResult + i, localTotal => // Body delegate which returns the new local total. // Add the local value
{
lock (locker) sum4+= localTotal;
} // to the master value.
);
I'm facing a strange issue that I can't explain and I would like to know if some of you have the answer I'm lacking.
I have a small test app for testing multithreading modifications I'm making to a much larger code. In this app I've set up two functions, one that does a loop sequentially and one that uses the Task.Parallel.For . The two of them print out the time and final elements generated. What I'm seeing is that the function that executes the Parallel.For is generating less items than the sequential loop and this is huge problem for the real app(it's messing with some final results). So, my question is if someone has any idea why this could be happening and if so, if there's anyway to fix it.
Here is the code for the function that uses the parallel.for in my test app:
static bool[] values = new bool[52];
static List<int[]> combinations = new List<int[]>();
static void ParallelLoop()
{
combinations.Clear();
Parallel.For(0, 48, i =>
{
if (values[i])
{
for (int j = i + 1; j < 49; j++)
if (values[j])
{
for (int k = j + 1; k < 50; k++)
{
if (values[k])
{
for (int l = k + 1; l < 51; l++)
{
if (values[l])
{
for (int m = l + 1; m < 52; m++)
{
if (values[m])
{
int[] combination = { i, j, k, l, m };
combinations.Add(combination);
}
}
}
}
}
}
}
}
}); // Parallel.For
}
And here is the app output:
Executing sequential loop...
Number of elements generated: 1,712,304
Executing parallel loop...
Number of elements generated: 1,464,871
Thanks in advance and if you need some clarifications I'll do my best to explain in further detail.
You can't just add items in your list by multiple threads at the same time without any synchronization mechanism. List<T>.Add() actually does some none-trivial internal stuff (buffers...etc) so adding an item is not an atomic thread-safe operation.
Either:
Provide a way to synchronize your writes
Use a collection that supports concurrent writes (see System.Collections.Concurrent namespace)
Don't use multi-threading at all
I've found two diferent methods to get a Max value from an array but I'm not really fond of parallel programing, so I really don't understand it.
I was wondering do this methods do the same or am I missing something?
I really don't have much information about them. Not even comments...
The first method:
int[] vec = ... (I guess the content doesn't matter)
static int naiveMax()
{
int max = vec[0];
object obj = new object();
Parallel.For(0, vec.Length, i =>
{
lock (obj) {
if (vec[i] > max) max = vec[i];
}
});
return max;
}
And the second one:
static int Max()
{
int max = vec[0];
object obj = new object();
Parallel.For(0, vec.Length, //could be Parallel.For<int>
() => vec[0],
(i, loopState, partial) =>
{
if(vec[i]>partial) partial = vec[i];
return partial;
},
partial => {
lock (obj) {
if( partial > max) max = partial;
}
});
return max;
}
Do these do the same or something diferent and what? Thanks ;)
Both find the maximum value in an array of integers. In an attempt to find the maximum value faster, they do it "in parallel" using the Parallel.For Method. Both methods fail at this, though.
To see this, we first need a sufficiently large array of integers. For small arrays, parallel processing doesn't give us a speed-up anyway.
int[] values = new int[100000000];
Random random = new Random();
for (int i = 0; i < values.Length; i++)
{
values[i] = random.Next();
}
Now we can run the two methods and see how long they take. Using an appropriate performance measurement setup (Stopwatch, array of 100,000,000 integers, 100 iterations, Release build, no debugger attached, JIT warm-up) I get the following results on my machine:
naiveMax 00:06:03.3737078
Max 00:00:15.2453303
So Max is much much better than naiveMax (6 minutes! cough).
But how does it compare to, say, PLINQ?
static int MaxPlinq(int[] values)
{
return values.AsParallel().Max();
}
MaxPlinq 00:00:11.2335842
Not bad, saved a few seconds. Now, what about a plain, old, sequential for loop for comparison?
static int Simple(int[] values)
{
int result = values[0];
for (int i = 0; i < values.Length; i++)
{
if (result < values[i]) result = values[i];
}
return result;
}
Simple 00:00:05.7837002
I think we have a winner.
Lesson learned: Parallel.For is not pixie dust that you can sprinkle over your code to
make it magically run faster. If performance matters, use the right tools and measure, measure, measure, ...
They appear to do the same thing, however they are very inefficient. The point of parallelization is to improve the speed of code that can be executed independently. Due to race conditions, discovering the maximum (as implemented here) requires an atomic semaphore/lock on the actual logic... Which means you're spinning up many threads and related resources simply to do the code sequentially anyway... Defeating the purpose of parallelization entirely.