Parallel For losing values when looping - c#

I'm facing a strange issue that I can't explain and I would like to know if some of you have the answer I'm lacking.
I have a small test app for testing multithreading modifications I'm making to a much larger code. In this app I've set up two functions, one that does a loop sequentially and one that uses the Task.Parallel.For . The two of them print out the time and final elements generated. What I'm seeing is that the function that executes the Parallel.For is generating less items than the sequential loop and this is huge problem for the real app(it's messing with some final results). So, my question is if someone has any idea why this could be happening and if so, if there's anyway to fix it.
Here is the code for the function that uses the parallel.for in my test app:
static bool[] values = new bool[52];
static List<int[]> combinations = new List<int[]>();
static void ParallelLoop()
{
combinations.Clear();
Parallel.For(0, 48, i =>
{
if (values[i])
{
for (int j = i + 1; j < 49; j++)
if (values[j])
{
for (int k = j + 1; k < 50; k++)
{
if (values[k])
{
for (int l = k + 1; l < 51; l++)
{
if (values[l])
{
for (int m = l + 1; m < 52; m++)
{
if (values[m])
{
int[] combination = { i, j, k, l, m };
combinations.Add(combination);
}
}
}
}
}
}
}
}
}); // Parallel.For
}
And here is the app output:
Executing sequential loop...
Number of elements generated: 1,712,304
Executing parallel loop...
Number of elements generated: 1,464,871
Thanks in advance and if you need some clarifications I'll do my best to explain in further detail.

You can't just add items in your list by multiple threads at the same time without any synchronization mechanism. List<T>.Add() actually does some none-trivial internal stuff (buffers...etc) so adding an item is not an atomic thread-safe operation.
Either:
Provide a way to synchronize your writes
Use a collection that supports concurrent writes (see System.Collections.Concurrent namespace)
Don't use multi-threading at all

Related

Filling array with multithreading

I am trying to read multidimensional arrays and get the index and values of the element if it is fitting the condition with multithreading.
I divided the multidimensional array into smaller subcubes under name jobs
If the condition fits, I save the value to the samples array, and index to the ValidCubeIndexList array. So the "samples" and the "ValidCubeIndexList" is going to be shared and used between threads.
I am not sure if this is a correct approach or not, but with parallelism, I couldn't find a way to lock iSample locally.
Parallel.For(0, jobs.Count, new ParallelOptions {
MaxDegreeOfParallelism = NumOfProcessor
},
delegate(int i, ParallelLoopState state) {
var job = jobs[i];
Index3 min = job.output.MinIJK;
Index3 max = job.output.MaxIJK;
var bulk = job.output.ToArray();
int x = bulk.GetLength(0);
int y = bulk.GetLength(1);
int z = bulk.GetLength(2);
for (int n = 0; n < x; n++) {
for (int m = 0; m < y; m++) {
for (int b = 0; b < z; b++) {
int activeIndex = Get3DIndex(min.I + n, min.J + m, min.K + b, cubeIndex.I,
cubeIndex.J, cubeIndex.K);
if (SelectionMaskIsActive) {
if (invFlags[activeIndex]) {
samples[iSample] = bulk[n, m, b];
ValidCubeIndexList[iSample] = activeIndex;
Interlocked.Increment(ref iSample);
}
} else {
samples[iSample] = bulk[n, m, b];
ValidCubeIndexList[iSample] = activeIndex;
Interlocked.Increment(ref iSample);
}
}
}
}
});
However the Interlocked.Increment(ref iSample) is not working as I expected.
How can I share and use the iSample parameter between threads?
As far as I can tell the usage and value of iSample is shared over multiple threads.
Incrementing it using Interlocked.Increment(ref iSample); will not cause the value to be locked locally in any way, so, if you increment, you're changing the value over all the threads which often leads to unexpected results.
If your timing is as such that you rely on this behavior I recommend to use an alternative approach or mark the variable as volatile, but even then its considered bad practice, because, whilst:
The volatile keyword indicates that a field might be modified by multiple threads that are executing at the same time.
it does not guarantee direct availability of the variable over all threads, by:
On a multiprocessor system, a volatile read operation does not guarantee to obtain the latest value written to that memory location by any processor. Similarly, a volatile write operation does not guarantee that the value written would be immediately visible to other processors.

Keep statistic about sorting algorithms

I have a homework about object oriented programming in c#. a part of my homework, I need to make 2 different sorting algorithms and putting the random numbers into them and observing statistic about 2 different algorithms.
about that my teaches said me in e-mail "Non static sorting class can keep statistic about sorting how many numbers, how fast, min, max, average.."
So there are my sorting algorithms which Insertion and Count Sortings. Please tell me how can i keep statistic about sorting.
Don't forget main subject of my homework is OOP.
class InsertionSorting : Sort
{
public override List<int> Sorting(List<int> SortList)
{
for ( int i=0; i<SortList.Count-1; i++)
{
for (int j= i+1; j>0; j--)
{
if (SortList[j-1] > SortList [j])
{
int temp = SortList[j - 1];
SortList[j - 1] = SortList[j];
SortList[j] = temp;
}
}
}
return SortList;
}
}
class CountSorting : Sort
{
public override List<int> Sorting(List<int> SortList)
{
int n = SortList.Count;
List<int> output = new List<int>();
List<int> count = new List<int>();
for (int i = 0; i < 1000; ++i)
{
count.Add(0);
output.Add(0);
}
for (int i = 0; i < n; ++i)
++count[SortList[i]];
for (int i = 1; i <= 999; ++i)
count[i] += count[i - 1];
for (int i = 0; i < n; ++i)
{
output[count[SortList[i]] - 1] = SortList[i];
--count[SortList[i]];
}
for (int i = 0; i < SortList.Count; i++)
SortList[i] = output[i];
return SortList;
}
}
Your sorting is being done in two classes - InsertionSorting & CountSorting.
If you want keep track of the statistics declare a variable in the class and increment it every iteration etc etc. Then you can see which one is more effective.
E.g
class InsertionSorting : Sort
{
private int iterations = 0
...
for (int j= i+1; j>0; j--)
{
if (SortList[j-1] > SortList [j])
{
iterations++
...
You could also declare a startTime and endTime allowing to you determine the time the sort took. At the start of "Sorting" record the start time and just before you return record the end time. Write a method to report the difference.
Your prof has told you how when they said "...statistics about sorting how many numbers, how fast, min, max, average.." Your best bet here is to create a class such as "Statistics" which contains a method that allows user input, either through args or direct user prompt. The variables should be as easy as "count of numbers to sort" "lower bounds of number range", "upper bound of number range", and, if automating the testing process, "number of times to iterate".
Given answers to these questions, you should run the two sorting algos with them (eg use a random number generator, and max and min to generate a list.) Your sorting algos need an addition to "log" these statistics. Most likely a variable that tracks the number of position swaps that occurred in the array.
I'm not about to write out your homework for you (that's your job, and you should get good at it.) But if you have any more questions to this, I may be able to steer you in the right direction if this is too vague and you are still struggling.

C# parallel for sharing internal variables

Hi I am working on the following code:
Parallel.For(1, residRanges.Count, i =>
{
int count = 0;
List<double> colAList = new List<double>();
List<double> colBList = new List<double>();
for (int x = 0; x < residErrorData.Count; x++)
{
foreach (pricateConnection in residErrorData[x].connList)
{
if (residRanges[i].connection == nc.connection)
{
colAList.Add(residErrorData[x].residualError);
colBList.Add(nc.freq);
count = count + 1;
}
}
}
colA = new double[count];
colB = new double[count];
for (int j = 0; j < count; j++)
{
colA[j] = colAList[j];
colB[j] = colBList[j];
}
residRangeError tempresid = residRanges[i];
tempresid = fitResid(tempresid, colA, colB);
residRanges[i] = tempresid;
residRanges[i].n = count;
}
});
If I don't use the parallel class my values seem to be accurate however when i use the parallel class for some reason it mixes up the values for the colA and colB. It's mixing them between the threads. I'm fairly new to parallel processing but I've been looking around, and can't seem to find any solutions. Does anyone know why the program seems to be sharing variables between threads?
I know the code isn't ideal I've been trying different things to figure out what was going wrong. I'm not trying to necessarily optimize it at the moment so much as understand why the variables in the different loops aren't staying separate.
residRanges[] is a list of class items. The for loops using it seems to get the right values they just begin to mix up which values go where when run in Parallel.For.
Thanks for any help! I could really use it!
(Reposting as an answer for sweet, sweet karma)
Your code looks as though colA and colB are declared outside the lambda's scope, which means the variable references could refer to different array objects as different threads run simultaneously (e.g. Thread 0 would change colA while Thread 1 is inside the for j < count loop.
Move the declarations of colA and colB to inside the lambda:
...
Double[] colA = new double[count];
Double[] colB = new double[count];
for (int j = 0; j < count; j++)
...
However I see that you don't actually do anything useful with colA and colB other than using them as a value holder for your fitResid function, you can simplify this by:
Change your fitResid function signature to accept IList<Double> instead of Double[]
Change your fitResid function call to pass in colAList and colBList in respectively, which will speed up your code by eliminating the unnecessary copying and memory allocation:
...
residRanges[i] = fitResid( residRanges[i], colAList, colBList );
residRanges[i].n = count;
...

Binary search slower, what am I doing wrong?

EDIT: so it looks like this is normal behavior, so can anyone just recommend a faster way to do these numerous intersections?
so my problem is this. I have 8000 lists (strings in each list). For each list (ranging from size 50 to 400), I'm comparing it to every other list and performing a calculation based on the intersection number. So I'll do
list1(intersect)list1= number
list1(intersect)list2= number
list1(intersect)list888= number
And I do this for every list. Previously, I had HashList and my code was essentially this: (well, I was actually searching through properties of an object, so I
had to modify the code a bit, but it's basically this:
I have my two versions below, but if anyone knows anything faster, please let me know!
Loop through AllLists, getting each list, starting with list1, and then do this:
foreach (List list in AllLists)
{
if (list1_length < list_length) //just a check to so I'm looping through the
//smaller list
{
foreach (string word in list1)
{
if (block.generator_list.Contains(word))
{
//simple integer count
}
}
}
// a little more code, but the same, but looping through the other list if it's smaller/bigger
Then I make the lists into regular lists, and applied Sort(), which changed my code to
foreach (List list in AllLists)
{
if (list1_length < list_length) //just a check to so I'm looping through the
//smaller list
{
for (int i = 0; i < list1_length; i++)
{
var test = list.BinarySearch(list1[i]);
if (test > -1)
{
//simple integer count
}
}
}
The first version takes about 6 seconds, the other one takes more than 20 (I just stop there cuz otherwise it would take more than a minute!!!) (and this is for a smallish subset of the data)
I'm sure there's a drastic mistake somewhere, but I can't find it.
Well I have tried three distinct methods for achieving this (assuming I understood the problem correctly). Please note I have used HashSet<int> in order to more easily generate random input.
setting up:
List<HashSet<int>> allSets = new List<HashSet<int>>();
Random rand = new Random();
for(int i = 0; i < 8000; ++i) {
HashSet<int> ints = new HashSet<int>();
for(int j = 0; j < rand.Next(50, 400); ++j) {
ints.Add(rand.Next(0, 1000));
}
allSets.Add(ints);
}
the three methods I checked (code is what runs in the inner loop):
the loop:
note that you are getting duplicated results in your code (intersecting set A with set B and later intersecting set B with set A).
It won't affect your performance thanks to the list length check you are doing. But iterating this way is clearer.
for(int i = 0; i < allSets.Count; ++i) {
for(int j = i + 1; j < allSets.Count; ++j) {
}
}
first method:
used IEnumerable.Intersect() to get the intersection with the other list and checked IEnumerable.Count() to get the size of the intersection.
var intersect = allSets[i].Intersect(allSets[j]);
count = intersect.Count();
this was the slowest one averaging 177s
second method:
cloned the smaller set of the two sets I was intersecting, then used ISet.IntersectWith() and checked the resulting sets Count.
HashSet<int> intersect;
HashSet<int> intersectWith;
if(allSets[i].Count < allSets[j].Count) {
intersect = new HashSet<int>(allSets[i]);
intersectWith = allSets[j];
} else {
intersect = new HashSet<int>(allSets[j]);
intersectWith = allSets[i];
}
intersect.IntersectWith(intersectWith);
count = intersect.Count;
}
}
this one was slightly faster, averaging 154s
third method:
did something very similar to what you did iterated over the shorter set and checked ISet.Contains on the longer set.
for(int i = 0; i < allSets.Count; ++i) {
for(int j = i + 1; j < allSets.Count; ++j) {
count = 0;
if(allSets[i].Count < allSets[j].Count) {
loopingSet = allSets[i];
containsSet = allSets[j];
} else {
loopingSet = allSets[j];
containsSet = allSets[i];
}
foreach(int k in loopingSet) {
if(containsSet.Contains(k)) {
++count;
}
}
}
}
this method was by far the fastest (as expected), averaging 66s
conclusion
the method you're using is the fastest of these three. I certainly can't think of a faster single threaded way to do this. Perhaps there is a better concurrent solution.
I've found that one of the most important considerations in iterating/searching any kind of collection is to choose the collection type very carefully. To iterate through a normal collection for your purposes will not be the most optimal. Try using something like:
System.Collections.Generic.HashSet<T>
Using the Contains() method while iterating over the shorter list of two (as you mentioned you're already doing) should give close to O(1) performance, the same as key lookups in the generic Dictionary type.

Copy an array backwards? Array.Copy?

I have a List<T> that I want to be able to copy to an array backwards, meaning start from List.Count and copy maybe 5 items starting at the end of the list and working its way backwards. I could do this with a simple reverse for loop; however there is probably a faster/more efficient way of doing this so I thought I should ask. Can I use Array.Copy somehow?
Originally I was using a Queue as that pops it off in the correct order I need, but I now need to pop off multiple items at once into an array and I thought a list would be faster.
Looks like Array.Reverse has native code for reversing an array which sometimes doesn't apply and would fall back to using a simple for loop. In my testing Array.Reverse is very slightly faster than a simple for loop. In this test of reversing a 1,000,000 element array 1,000 times, Array.Reverse is about 600ms whereas a for-loop is about 800ms.
I wouldn't recommend performance as a reason to use Array.Reverse though. It's a very minor difference which you'll lose the minute you load it into a List which will loop through the array again. Regardless, you shouldn't worry about performance until you've profiled your app and identified the performance bottlenecks.
public static void Test()
{
var a = Enumerable.Range(0, 1000000).ToArray();
var stopwatch = Stopwatch.StartNew();
for(int i=0; i<1000; i++)
{
Array.Reverse(a);
}
stopwatch.Stop();
Console.WriteLine("Elapsed Array.Reverse: " + stopwatch.ElapsedMilliseconds);
stopwatch = Stopwatch.StartNew();
for (int i = 0; i < 1000; i++)
{
MyReverse(a);
}
stopwatch.Stop();
Console.WriteLine("Elapsed MyReverse: " + stopwatch.ElapsedMilliseconds);
}
private static void MyReverse(int[] a)
{
int j = a.Length - 1;
for(int i=0; i<j; i++, j--)
{
int z = a[i];
a[i] = a[j];
a[j] = z;
}
}
It is not possible to do this faster than a simple for loop.
You can accomplish it any number of ways, but the fastest way is get the elements in exactly the manner you are. You can use Array.Reverse, Array.Copy, etc., or you can use LINQ and extension methods, and both are valid alternatives, but they shouldn't be any faster.
In one of your comments:
Currently we are pulling out one result and committing it to a database one at a time
There is a big difference between using a for loop to iterate backwards over a List<T> and committing records to a database one at a time. The former is fine; nobody's endorsing the latter.
Why not just iterate first--to populate an array--and then send that array into the database, all populated?
var myArray = new T[numItemsYouWantToSend];
int arrayIndex = 0;
for (int i = myList.Count - 1; arrayIndex < myArray.Length; --i) {
if (i < 0) break;
myArray[arrayIndex++] = myList[i];
}
UpdateDatabase(myArray);

Categories

Resources