Parallel.ForEach is exited before it completes - c#

Well, I'm using Parallel ForEach to process a very big array of Colors (35M of indexers).
I'm using Partitioner.Create to do this with perfomance. But something unexpected is happening:
private Color32[] MainGeneration(Color32[] proccesed, Color32[] dupe, int width, int height)
{
int i = 0;
Parallel.ForEach(Partitioner.Create(0, totalIndexes), item =>
{
int x = i % width;
int y = height - i / width - 1;
dupe[i] = (UnityEngine.Color)MapIteration((Color)((Color)(UnityEngine.Color)proccesed[i]).Clone(), i, x, y)
++i;
if (i % updateInterlockedEvery == 0)
{
++currentIndex; //Interlocked.Increment(ref currentIndex);
}
});
// If Parallel.ForEach is blocking, why this is outputting me 12 when total indexes value is 35.000.000?
Debug.Log(i);
return dupe;
}
As I wroted on the comment, why is this happening?
The expected behaviour of this, is to process a large image using parallelism, not only a small piece.
processed contains the full image.
dupe contains a empty array that will be completed on each iteration.
I do all this in the local scope to avoid heap problems.

Don't you want something like, Fiddle Here
using System.Collections.Concurrent;
using System.Threading.Tasks;
using UnityEngine;
public class Program
{
private void MainGeneration(
Color32[] source,
Color32[] target,
int width,
int height)
{
Parallel.ForEach(Partitioner.Create(source, true)
.GetOrderableDynamicPartitions(), colorItem =>
{
var i = colorItem.Key;
var color = colorItem.Value;
var x = i % width;
var y = height - i / width - 1;
target[i] = this.Map(color, i, x, y);
});
}
private Color32 Map(Color32 color, long i, long x, long y)
{
return color;
}
}

++i; is in fact shorthand for something like this:
temp = i + 1;
i = temp;
Luckily you are using int not long, so at least the i = temp; assignment is atomic, the explanation is easier :)
If two threads are both doing ++i; something like this can happen (only two threads considered for simplicity):
//Let's say i == 0
Thread 2 calculates i + 1 //== 1
Thread 1 calculates i + 1 //== 1
Thread 1 sets i = 1;
Thread 2 sets i = 1;
So here you would probably expect i to be 2, but in fact it is 1 by the end of this.
If you want to increment i in a threadsafe manner you can do:
Interlocked.Increment(ref i);
As pointed in your code for currentIndex which should be also calculated like this.
I see another issue with the code given the huge discrepancy in numbers. Exceptions that happen outside of the main thread are not reported to/on the main thread if the IsBackGround property of the thread is true. To guard against this, you should try/catch the inner block in the foreach to also count exceptions in a similar way.
Or even better, get a list of exceptions in a ConcurrentQueue/ConcurrentBag :
// Use ConcurrentQueue to enable safe enqueueing from multiple threads.
var exceptions = new ConcurrentQueue<Exception>();
// Execute the complete loop and capture all exceptions.
Parallel.ForEach(data, d =>
{
try
{
// Cause a few exceptions, but not too many.
if (d < 3)
throw new ArgumentException($"Value is {d}. Value must be greater than or equal to 3.");
else
Console.Write(d + " ");
}
// Store the exception and continue with the loop.
catch (Exception e)
{
exceptions.Enqueue(e);
}
});
Console.WriteLine();
// Throw the exceptions here after the loop completes.
if (exceptions.Count > 0) throw new AggregateException(exceptions);
(source: https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/how-to-handle-exceptions-in-parallel-loops)

Related

How to split byte array of an image into multiple byte arrays using Tasks

Objective:
I'm making a simple Image processing app that uses Tasks to multithread. A user selects an image from a folder and its displayed in the PicBox. When he inputs the number of threads, colorType to change and Value(0-255) of that colorType(R,G,B), and click edit button, the image is:
Procedure
Converted to byte array
The byte array is returned and a pivot is computed according to thread no.
Task list is created and each task is assigned a start and end index of the larger byte array
In the method, the small portion of larger byte (from start to end index) is saved into smaller byte array
The method then converts the small byte array to Image and returns the image
Problem:
Everything goes fine until in the 5th step I try to convert the byte array to image. It specifically happens when the start index is greater than 0 which is during 2nd task's execution. It works ine for 1st task. Could it be that it can't accept Start Index >0?
Please look into the following code:
Code
List<Task<Image>> processImgTask = new List<Task<Image>>(threadCount);
threadCount = Convert.ToInt32(threadCombox.SelectedItem);
interval = imgArray.Length / threadCount;
for (int i = 0; i < threadCount; i++)
{
Start = End;
End += interval;
if (i == threadCount - 1)
{
End = imgArray.Length;
}
object data = new object[3] { Start, End, imgArray };
processImgTask.Add(new Task<Image>(ImgProcess, data));
}
//Task.WaitAll(processImgTask);
//EDIT followed by comments and answer
Parallel.ForEach(processImgTask, task =>
{
task.Start();
taskPicbox.Image = task.Result;
});
private Image ImgProcess(object data)
{
object[] indexes = (object[])data;
int Start=(int)indexes[0];
int End = (int)indexes[1];
byte[] img = (byte[])indexes[2];
List<byte> splitArray = new List<byte>();
for (int i =Start;i<End;i++)
{
splitArray.Add(img[i]);
}
byte[] b = splitArray.ToArray();
//Error occurs here when task 2 (thread 2) is being executed->
Image x = (Bitmap)((new ImageConverter()).ConvertFrom(b));
//System.ArgumentException: 'Parameter is not valid.'
return x;
}
See this answer on how to convert a byte array with raw pixeldata to an bitmap.
I would also strongly suggest using Parallel.For instead of tasks. Tasks are designed to run code asynchronosly, i.e. allow the computer to do things while it is waiting for data. Parallel.For/Foreach is designed to run code in concurrently, i.e. use multiple cores for better performance. Async and parallel is not the same.
I would also recommend starting with a simple single threaded implementation of whatever you are trying to do. Processors are fast, unless you are doing something very demanding the overhead can be significant. And while parallelization may make your code run four times faster (or however many CPU cores you have), it is quite often the case that other optimization can improve performance hundredfold or more. And you can always parallelize later if needed.
For images the typical way to do parallelization would be do a parallel.For over each row in the image.
As a response on JonasH, i've made an example for you. This example uses the Span type. It can be done with directly the array or an ArraySegment<byte>.
It's an example how you could process lines multithreaded:
private void Main()
{
int width = 1024;
int height = 768;
int channelCount = 3;
// create a buffer
var data = new byte[width * height * channelCount];
// fill with some data
// example: 0, 1, 2, 3, 4, 5, 6
PutSomeValuesInThatBuffer(data);
// process the image and specify a line-edit callback
// transforms to: 255, 254, 253, 252, 251, 250
ProcessImage(data, width, height, channelCount, linePixels =>
{
int offset = 0;
// we need to loop all pixel on this line
while (offset < linePixels.Length)
{
// for RGB | R = channel[0], G = channel[1], B = channel[2], etc...
// lets invert the colors, this loop isn't quite necessary
// but it shows the use of channels (R, G, B)
for (int i = 0; i < channelCount; i++)
{
linePixels[offset] = 255 - linePixels[offset];
offset++;
}
}
});
}
public delegate void LineProcessorAction(Span<byte> line);
// this is the process method which will split the data into lines
// and process them over multiple threads.
private void ProcessImage(
byte[] data,
int width, int height, int channelCount,
LineProcessorAction lineProcessor)
{
var rowSizeInBytes = width * channelCount;
Parallel.For(0, height, index =>
lineProcessor(new Span<byte>(data, index * rowSizeInBytes, rowSizeInBytes)));
}
private static void PutSomeValuesInThatBuffer(byte[] data)
{
for (int i = 0; i < data.Length; i++)
data[i] = (byte)i;
}

Parallel-For loops and consistency

I'm working an Euler problem with an outer and inner loop. The outer loop contains the value being checked, the inner loop controls how many test iterations pass, in this case looking for Lychrel numbers.
The outer loop works in parallel just fine, but the inner loop is extremely inconsistent. You can see from my commented out lines that I've tried a List<T> and used locking, as well as using ConcurrentQueue<T>. My initial implementation used a bool set to true (that the number IS a Lychrel number) which would get set to false if proven otherwise after n-iterations. It would then just count the number of collected Lychrel numbers. The bool operation wasn't working so well, jumping out of the inner loop (even with a lock). I even tried to implement a threadsafe boolean, but so far nothing has kept the inner loop consistent. At this point it's become a learning exercise. I'm generally familiar with threading, and use it fairly regularly even with collections, but this one stumps me as to the root cause of the problem.
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
BigInteger answer = 0;
List<long> lychrels = new List<long>();
ConcurrentQueue<long> CQlychrels = new ConcurrentQueue<long>();
long maxValue = 10000;
long maxIterations = 50;
sw.Start();
//for (int i = 1; i < maxValue; i++)
Parallel.For(1, maxValue, i =>
{
BigInteger workingValue = i;
//bool lychrel = true;
//for (int w = 1; w < maxIterations; w++)
Parallel.For(1, maxIterations, (w, loopstate) =>
{
workingValue = workingValue.LychrelAdd();
if (workingValue.ToString().Length > 1)
if (IsPalindrome(workingValue))
{
//lychrel = false;
CQlychrels.Enqueue(i);
//lock (lychrels)
//lychrels.Add(i);
loopstate.Break();
//break;
}
});
//if (!lychrel)
//lock (lychrels)
//lychrels.Add(i);
});
answer = maxValue - CQlychrels.Count();
sw.Stop();
Console.WriteLine("Answer: " + answer);
Console.WriteLine("Found in " + sw.ElapsedTicks + " ticks.");
Console.WriteLine("Found in " + sw.ElapsedMilliseconds + "ms.");
while (Console.ReadKey() == null)
{ }
Environment.Exit(0);
}
BigInteger.LychrelAdd() just takes the value and a mirror of it's value and adds them together.
I suspect, perhaps, that either that or IsPalindrome() not being threadsafe may be the cause? Setting workingValue outside of that loop and working on it inside? Something to do with BigInteger being a reference value and that reference changing?

Prevent scrollbar position from being changed in richtextbox?

I have the following code:
private void HighlightSyntax(string syntax)
{
Regex regex = null;
switch (syntax)
{
case ".xml":
regex = new Regex(#"<\/?[^>\/]*>");
break;
}
if (regex != null)
{
Input.BeginUpdate();
// I want to save scrollbar position here and then restore it
// or maybe even disable it from being changed
int lastIndex = Input.SelectionStart;
int lastLength = Input.SelectionLength;
Input.SelectAll();
// gets matches
var matches = regex.Matches(Input.Text).Cast<Match>().ToArray();
if (matches.Length > 0) // divide into tasks and select all matches
{
Color color = Color.ForestGreen;
int wordsPerTask = 500;
int tasksAmount = (matches.Length / wordsPerTask) + 1;
int start = 0;
int end = matches.Length - 1;
Task[] tasks = new Task[tasksAmount];
for (int i = 0; i < tasksAmount; i++)
{ // dividing
start = matches.Length / tasksAmount * i;
end = matches.Length / tasksAmount * (i + 1) - 1;
var start1 = start;
var end1 = end;
tasks[i] = Task.Run(() => { SelectMatchesInArr(matches, start, end, color); } );
}
if (matches.Length - 1 - end > 0)
SelectMatchesInArr(matches, end + 1, matches.Length - 1, color);
Task.WaitAll(tasks);
}
Input.Select(lastIndex, lastLength);
Input.SelectionColor = Color.Black;
// Restore Input.ScrollBarPosition here
Input.EndUpdate();
}
}
// selects matches from start to end Indexes with Color color.
private void SelectMatchesInArr(Match[] matches, int startIndex, int endIndex, Color color)
{
for (int i = startIndex; i <= endIndex; i++)
{
int selectionStart = Input.SelectionStart;
lock (_locker)
{
Input.Select(matches[i].Index, matches[i].Length);
Input.SelectionColor = color;
Input.DeselectAll();
Input.SelectionStart = selectionStart;
Input.SelectionLength = 0;
}
}
}
It highlights syntax in richtexbox, if regex matches anything related to that syntax. It all worked until I decided to divide selecting into multiple tasks.
After dividing the selection into multiple tasks, my scrollbar position is not stable. Well, I want it to be stable, I don't want it to be changed via code. How do I prevent it from being changed if I have multiple tasks manipulating over richtextbox? What should I do in my situation? Also check comments in the code, they are written in order to help you explain what I want to do.
By the way, the BeginUpdate() and EndUpdate() methods are extension methods that have been taken from here: Hans Passant's derived from richtextbox class
Maybe it would be better to use multithreading only for generating the list of matches and then use them for highlighting?
Also, it seems a bit dangerous to modify UI in multiple threads without any synchronization since it's possible that one thread would call 'Input.Select' and the other one 'Input.DeselectAll' before the first one will set the color.
Applying UI changes in one thread would eliminate that possibility.

C# - Same Calculation Slower Using Int vs Long?

I've run into something really strange while working my way through some practice problems using dotnetfiddle. I have a program that applies a mathematical sequence (different calculations each step depending on whether the current step is even or odd):
using System;
public class Program
{
public static void Main()
{
int ceiling = 1000000;
int maxMoves = 0;
int maxStart = 0;
int testNumber;
for(int i = 1; i <= ceiling; i++){
testNumber = i;
int moves = 1;
while(testNumber != 1){
if(testNumber % 2 == 0){
testNumber = testNumber / 2;
moves++;
} else {
testNumber = (3 * testNumber) + 1;
moves++;
}
}
if(moves > maxMoves){
maxMoves = moves;
maxStart = i;
}
}
Console.WriteLine(maxStart);
Console.WriteLine(maxMoves);
}
}
As written, the execution time limit gets exceeded. However, if I change the declaration of test number to a long instead of an int, the program runs:
int maxMoves = 0;
int maxStart = 0;
**long** testNumber;
Why would making this change, which requires recasting i from an int to a long each increment of the for loop (at testNumber = i), be faster than leaving this as an int? Is performing the mathematical operations faster on a long value?
The reason seems to be an overflow. If you run that code enclosed in a
checked
{
// your code
}
you get an OverflowException when running with testNumber as int.
The reason is that eventually 3*testNumber+1 exceeds the boundary of an int. In an unchecked context this does not throw an exception, but leads to negative values for testNumber.
At this point your sequence (I think it's Collatz, right?) does not work anymore and the calculation takes (probably infinitly) longer, because you never reach 1 (or at least it takes you a whole lot more iterations to reach 1).

Code sample that shows casting to uint is more efficient than range check

So I am looking at this question and the general consensus is that uint cast version is more efficient than range check with 0. Since the code is also in MS's implementation of List I assume it is a real optimization. However I have failed to produce a code sample that results in better performance for the uint version. I have tried different tests and there is something missing or some other part of my code is dwarfing the time for the checks. My last attempt looks like this:
class TestType
{
public TestType(int size)
{
MaxSize = size;
Random rand = new Random(100);
for (int i = 0; i < MaxIterations; i++)
{
indexes[i] = rand.Next(0, MaxSize);
}
}
public const int MaxIterations = 10000000;
private int MaxSize;
private int[] indexes = new int[MaxIterations];
public void Test()
{
var timer = new Stopwatch();
int inRange = 0;
int outOfRange = 0;
timer.Start();
for (int i = 0; i < MaxIterations; i++)
{
int x = indexes[i];
if (x < 0 || x > MaxSize)
{
throw new Exception();
}
inRange += indexes[x];
}
timer.Stop();
Console.WriteLine("Comparision 1: " + inRange + "/" + outOfRange + ", elapsed: " + timer.ElapsedMilliseconds + "ms");
inRange = 0;
outOfRange = 0;
timer.Reset();
timer.Start();
for (int i = 0; i < MaxIterations; i++)
{
int x = indexes[i];
if ((uint)x > (uint)MaxSize)
{
throw new Exception();
}
inRange += indexes[x];
}
timer.Stop();
Console.WriteLine("Comparision 2: " + inRange + "/" + outOfRange + ", elapsed: " + timer.ElapsedMilliseconds + "ms");
}
}
class Program
{
static void Main()
{
TestType t = new TestType(TestType.MaxIterations);
t.Test();
TestType t2 = new TestType(TestType.MaxIterations);
t2.Test();
TestType t3 = new TestType(TestType.MaxIterations);
t3.Test();
}
}
The code is a bit of a mess because I tried many things to make uint check perform faster like moving the compared variable into a field of a class, generating random index access and so on but in every case the result seems to be the same for both versions. So is this change applicable on modern x86 processors and can someone demonstrate it somehow?
Note that I am not asking for someone to fix my sample or explain what is wrong with it. I just want to see the case where the optimization does work.
if (x < 0 || x > MaxSize)
The comparison is performed by the CMP processor instruction (Compare). You'll want to take a look at Agner Fog's instruction tables document (PDF), it list the cost of instructions. Find your processor back in the list, then locate the CMP instruction.
For mine, Haswell, CMP takes 1 cycle of latency and 0.25 cycles of throughput.
A fractional cost like that could use an explanation, Haswell has 4 integer execution units that can execute instructions at the same time. When a program contains enough integer operations, like CMP, without an interdependency then they can all execute at the same time. In effect making the program 4 times faster. You don't always manage to keep all 4 of them busy at the same time with your code, it is actually pretty rare. But you do keep 2 of them busy in this case. Or in other words, two comparisons take just as long as single one, 1 cycle.
There are other factors at play that make the execution time identical. One thing helps is that the processor can predict the branch very well, it can speculatively execute x > MaxSize in spite of the short-circuit evaluation. And it will in fact end up using the result since the branch is never taken.
And the true bottleneck in this code is the array indexing, accessing memory is one of the slowest thing the processor can do. So the "fast" version of the code isn't faster even though it provides more opportunity to allow the processor to concurrently execute instructions. It isn't much of an opportunity today anyway, a processor has too many execution units to keep busy. Otherwise the feature that makes HyperThreading work. In both cases the processor bogs down at the same rate.
On my machine, I have to write code that occupies more than 4 engines to make it slower. Silly code like this:
if (x < 0 || x > MaxSize || x > 10000000 || x > 20000000 || x > 3000000) {
outOfRange++;
}
else {
inRange++;
}
Using 5 compares, now I can a difference, 61 vs 47 msec. Or in other words, this is a way to count the number of integer engines in the processor. Hehe :)
So this is a micro-optimization that probably used to pay off a decade ago. It doesn't anymore. Scratch it off your list of things to worry about :)
I would suggest attempting code which does not throw an exception when the index is out of range. Exceptions are incredibly expensive and can completely throw off your bench results.
The code below does a timed-average bench for 1,000 iterations of 1,000,000 results.
using System;
using System.Diagnostics;
namespace BenchTest
{
class Program
{
const int LoopCount = 1000000;
const int AverageCount = 1000;
static void Main(string[] args)
{
Console.WriteLine("Starting Benchmark");
RunTest();
Console.WriteLine("Finished Benchmark");
Console.Write("Press any key to exit...");
Console.ReadKey();
}
static void RunTest()
{
int cursorRow = Console.CursorTop; int cursorCol = Console.CursorLeft;
long totalTime1 = 0; long totalTime2 = 0;
long invalidOperationCount1 = 0; long invalidOperationCount2 = 0;
for (int i = 0; i < AverageCount; i++)
{
Console.SetCursorPosition(cursorCol, cursorRow);
Console.WriteLine("Running iteration: {0}/{1}", i + 1, AverageCount);
int[] indexArgs = RandomFill(LoopCount, int.MinValue, int.MaxValue);
int[] sizeArgs = RandomFill(LoopCount, 0, int.MaxValue);
totalTime1 += RunLoop(TestMethod1, indexArgs, sizeArgs, ref invalidOperationCount1);
totalTime2 += RunLoop(TestMethod2, indexArgs, sizeArgs, ref invalidOperationCount2);
}
PrintResult("Test 1", TimeSpan.FromTicks(totalTime1 / AverageCount), invalidOperationCount1);
PrintResult("Test 2", TimeSpan.FromTicks(totalTime2 / AverageCount), invalidOperationCount2);
}
static void PrintResult(string testName, TimeSpan averageTime, long invalidOperationCount)
{
Console.WriteLine(testName);
Console.WriteLine(" Average Time: {0}", averageTime);
Console.WriteLine(" Invalid Operations: {0} ({1})", invalidOperationCount, (invalidOperationCount / (double)(AverageCount * LoopCount)).ToString("P3"));
}
static long RunLoop(Func<int, int, int> testMethod, int[] indexArgs, int[] sizeArgs, ref long invalidOperationCount)
{
Stopwatch sw = new Stopwatch();
Console.Write("Running {0} sub-iterations", LoopCount);
sw.Start();
long startTickCount = sw.ElapsedTicks;
for (int i = 0; i < LoopCount; i++)
{
invalidOperationCount += testMethod(indexArgs[i], sizeArgs[i]);
}
sw.Stop();
long stopTickCount = sw.ElapsedTicks;
long elapsedTickCount = stopTickCount - startTickCount;
Console.WriteLine(" - Time Taken: {0}", new TimeSpan(elapsedTickCount));
return elapsedTickCount;
}
static int[] RandomFill(int size, int minValue, int maxValue)
{
int[] randomArray = new int[size];
Random rng = new Random();
for (int i = 0; i < size; i++)
{
randomArray[i] = rng.Next(minValue, maxValue);
}
return randomArray;
}
static int TestMethod1(int index, int size)
{
return (index < 0 || index >= size) ? 1 : 0;
}
static int TestMethod2(int index, int size)
{
return ((uint)(index) >= (uint)(size)) ? 1 : 0;
}
}
}
You aren't comparing like with like.
The code you were talking about not only saved one branch by using the optimisation, but also 4 bytes of CIL in a small method.
In a small method 4 bytes can be the difference in being inlined and not being inlined.
And if the method calling that method is also written to be small, then that can mean two (or more) method calls are jitted as one piece of inline code.
And maybe some of it is then, because it is inline and available for analysis by the jitter, optimised further again.
The real difference is not between index < 0 || index >= _size and (uint)index >= (uint)_size, but between code that has repeated efforts to minimise the method body size and code that does not. Look for example at how another method is used to throw the exception if necessary, further shaving off a couple of bytes of CIL.
(And no, that's not to say that I think all methods should be written like that, but there certainly can be performance differences when one does).

Categories

Resources