Hi I am working on the following code:
Parallel.For(1, residRanges.Count, i =>
{
int count = 0;
List<double> colAList = new List<double>();
List<double> colBList = new List<double>();
for (int x = 0; x < residErrorData.Count; x++)
{
foreach (pricateConnection in residErrorData[x].connList)
{
if (residRanges[i].connection == nc.connection)
{
colAList.Add(residErrorData[x].residualError);
colBList.Add(nc.freq);
count = count + 1;
}
}
}
colA = new double[count];
colB = new double[count];
for (int j = 0; j < count; j++)
{
colA[j] = colAList[j];
colB[j] = colBList[j];
}
residRangeError tempresid = residRanges[i];
tempresid = fitResid(tempresid, colA, colB);
residRanges[i] = tempresid;
residRanges[i].n = count;
}
});
If I don't use the parallel class my values seem to be accurate however when i use the parallel class for some reason it mixes up the values for the colA and colB. It's mixing them between the threads. I'm fairly new to parallel processing but I've been looking around, and can't seem to find any solutions. Does anyone know why the program seems to be sharing variables between threads?
I know the code isn't ideal I've been trying different things to figure out what was going wrong. I'm not trying to necessarily optimize it at the moment so much as understand why the variables in the different loops aren't staying separate.
residRanges[] is a list of class items. The for loops using it seems to get the right values they just begin to mix up which values go where when run in Parallel.For.
Thanks for any help! I could really use it!
(Reposting as an answer for sweet, sweet karma)
Your code looks as though colA and colB are declared outside the lambda's scope, which means the variable references could refer to different array objects as different threads run simultaneously (e.g. Thread 0 would change colA while Thread 1 is inside the for j < count loop.
Move the declarations of colA and colB to inside the lambda:
...
Double[] colA = new double[count];
Double[] colB = new double[count];
for (int j = 0; j < count; j++)
...
However I see that you don't actually do anything useful with colA and colB other than using them as a value holder for your fitResid function, you can simplify this by:
Change your fitResid function signature to accept IList<Double> instead of Double[]
Change your fitResid function call to pass in colAList and colBList in respectively, which will speed up your code by eliminating the unnecessary copying and memory allocation:
...
residRanges[i] = fitResid( residRanges[i], colAList, colBList );
residRanges[i].n = count;
...
Related
In C#, I create a list of lists and try to access/modify the elements. There is a problem in that it seems that an operation (e.g. adding a constant) seems to apply not only to the wanted index but to all elements.
Here is the piece of code:
List<List<double>> D = new List<List<double>>();
List<double> ttmp = new List<double>(new double[256]);
for (int i = 0; i < 100; i++)
{
D.Add(ttmp);
}
for (int i = 0; i < 100; i++)
{
D[i][0] = D[i][0] + 1;
}
D is a list of 100 lists, each of size 256. It initially contains only zeroes. In the second loop, I ask that the first element of every of the 100 lists be incremented by one.
As a results, the entire "matrix" is filled with ones, i.e. not only D[0][0], D[1][0] ... D[99][0], but also D[0][1], D[0][2] , etc.
Why is that?
NB: the C++ equivalent with vector<vector<double>> works perfectly fine...
When code is executed, result is not that
D[0][0], D[1][0] ... D[99][0], but also D[0][1], D[0][2] are modified.
Result is that inner lists have [0] element equal to 100.
Why so? Because you created one list, and added it 100 times into D. But that is the same list - when you modify it, you modify all instances (because this is reference data type).
Change it instead to:
List<List<double>> D = new List<List<double>>();
for (int i = 0; i < 100; i++)
{
D.Add(new List<double>(new double[256]));
}
foreach (var innerList in D)
{
innerList[0]++;
}
EDIT: so it looks like this is normal behavior, so can anyone just recommend a faster way to do these numerous intersections?
so my problem is this. I have 8000 lists (strings in each list). For each list (ranging from size 50 to 400), I'm comparing it to every other list and performing a calculation based on the intersection number. So I'll do
list1(intersect)list1= number
list1(intersect)list2= number
list1(intersect)list888= number
And I do this for every list. Previously, I had HashList and my code was essentially this: (well, I was actually searching through properties of an object, so I
had to modify the code a bit, but it's basically this:
I have my two versions below, but if anyone knows anything faster, please let me know!
Loop through AllLists, getting each list, starting with list1, and then do this:
foreach (List list in AllLists)
{
if (list1_length < list_length) //just a check to so I'm looping through the
//smaller list
{
foreach (string word in list1)
{
if (block.generator_list.Contains(word))
{
//simple integer count
}
}
}
// a little more code, but the same, but looping through the other list if it's smaller/bigger
Then I make the lists into regular lists, and applied Sort(), which changed my code to
foreach (List list in AllLists)
{
if (list1_length < list_length) //just a check to so I'm looping through the
//smaller list
{
for (int i = 0; i < list1_length; i++)
{
var test = list.BinarySearch(list1[i]);
if (test > -1)
{
//simple integer count
}
}
}
The first version takes about 6 seconds, the other one takes more than 20 (I just stop there cuz otherwise it would take more than a minute!!!) (and this is for a smallish subset of the data)
I'm sure there's a drastic mistake somewhere, but I can't find it.
Well I have tried three distinct methods for achieving this (assuming I understood the problem correctly). Please note I have used HashSet<int> in order to more easily generate random input.
setting up:
List<HashSet<int>> allSets = new List<HashSet<int>>();
Random rand = new Random();
for(int i = 0; i < 8000; ++i) {
HashSet<int> ints = new HashSet<int>();
for(int j = 0; j < rand.Next(50, 400); ++j) {
ints.Add(rand.Next(0, 1000));
}
allSets.Add(ints);
}
the three methods I checked (code is what runs in the inner loop):
the loop:
note that you are getting duplicated results in your code (intersecting set A with set B and later intersecting set B with set A).
It won't affect your performance thanks to the list length check you are doing. But iterating this way is clearer.
for(int i = 0; i < allSets.Count; ++i) {
for(int j = i + 1; j < allSets.Count; ++j) {
}
}
first method:
used IEnumerable.Intersect() to get the intersection with the other list and checked IEnumerable.Count() to get the size of the intersection.
var intersect = allSets[i].Intersect(allSets[j]);
count = intersect.Count();
this was the slowest one averaging 177s
second method:
cloned the smaller set of the two sets I was intersecting, then used ISet.IntersectWith() and checked the resulting sets Count.
HashSet<int> intersect;
HashSet<int> intersectWith;
if(allSets[i].Count < allSets[j].Count) {
intersect = new HashSet<int>(allSets[i]);
intersectWith = allSets[j];
} else {
intersect = new HashSet<int>(allSets[j]);
intersectWith = allSets[i];
}
intersect.IntersectWith(intersectWith);
count = intersect.Count;
}
}
this one was slightly faster, averaging 154s
third method:
did something very similar to what you did iterated over the shorter set and checked ISet.Contains on the longer set.
for(int i = 0; i < allSets.Count; ++i) {
for(int j = i + 1; j < allSets.Count; ++j) {
count = 0;
if(allSets[i].Count < allSets[j].Count) {
loopingSet = allSets[i];
containsSet = allSets[j];
} else {
loopingSet = allSets[j];
containsSet = allSets[i];
}
foreach(int k in loopingSet) {
if(containsSet.Contains(k)) {
++count;
}
}
}
}
this method was by far the fastest (as expected), averaging 66s
conclusion
the method you're using is the fastest of these three. I certainly can't think of a faster single threaded way to do this. Perhaps there is a better concurrent solution.
I've found that one of the most important considerations in iterating/searching any kind of collection is to choose the collection type very carefully. To iterate through a normal collection for your purposes will not be the most optimal. Try using something like:
System.Collections.Generic.HashSet<T>
Using the Contains() method while iterating over the shorter list of two (as you mentioned you're already doing) should give close to O(1) performance, the same as key lookups in the generic Dictionary type.
I'm facing a strange issue that I can't explain and I would like to know if some of you have the answer I'm lacking.
I have a small test app for testing multithreading modifications I'm making to a much larger code. In this app I've set up two functions, one that does a loop sequentially and one that uses the Task.Parallel.For . The two of them print out the time and final elements generated. What I'm seeing is that the function that executes the Parallel.For is generating less items than the sequential loop and this is huge problem for the real app(it's messing with some final results). So, my question is if someone has any idea why this could be happening and if so, if there's anyway to fix it.
Here is the code for the function that uses the parallel.for in my test app:
static bool[] values = new bool[52];
static List<int[]> combinations = new List<int[]>();
static void ParallelLoop()
{
combinations.Clear();
Parallel.For(0, 48, i =>
{
if (values[i])
{
for (int j = i + 1; j < 49; j++)
if (values[j])
{
for (int k = j + 1; k < 50; k++)
{
if (values[k])
{
for (int l = k + 1; l < 51; l++)
{
if (values[l])
{
for (int m = l + 1; m < 52; m++)
{
if (values[m])
{
int[] combination = { i, j, k, l, m };
combinations.Add(combination);
}
}
}
}
}
}
}
}
}); // Parallel.For
}
And here is the app output:
Executing sequential loop...
Number of elements generated: 1,712,304
Executing parallel loop...
Number of elements generated: 1,464,871
Thanks in advance and if you need some clarifications I'll do my best to explain in further detail.
You can't just add items in your list by multiple threads at the same time without any synchronization mechanism. List<T>.Add() actually does some none-trivial internal stuff (buffers...etc) so adding an item is not an atomic thread-safe operation.
Either:
Provide a way to synchronize your writes
Use a collection that supports concurrent writes (see System.Collections.Concurrent namespace)
Don't use multi-threading at all
First off I'm sure that I am using the wrong terminology here but I will fix it if someone comments on it. Please be gentle.
So I have multiple charts on a page and I am performing virtually identical actions on each. For demonstrative purposes lets call my charts something like: chart1, chart2, ..., chartn where n is somewhere in the vicinity of 20. What I would like to do is drop this in a for loop and perform all the work in one smaller chunk of code, especially if I have to tweak it later.
So my question is whether or not I can vary the n part representing the object (terminology?) so I can get this done more efficiently.
i.e.:
for(int i = 0; i < 20; i++)
{
String chartName = "chart" + i;
chartName.Series.Clear();
}
I have a feeling you can't do this with a string so I was looking into doing a foreach but I don't know how to do this with charts.
Thanks a lot!
You should put the charts in a list. For example, this makes a list of Chart objects (or whatever your chart type is):
List<Chart> charts = new List<Chart>();
Then you can add charts:
charts.Add(new Chart());
And use them:
for (int i = 0; i < charts.Count; i++)
{
charts[i].Series.Clear();
}
Of course, you can make the charts variable a field in your class.
You can directly initialize a list (or array, or dictionary1) like this:
List<Chart> charts = new List<Charts>()
{
new Chart(),
new Chart(),
existingChart1,
existingChart2
};
Or, if you create a new array of objects using that syntax...
Chart[] arrayOfCharts = new []
{
new Chart(),
new Chart(),
existingChart1,
existingChart2
};
...then you can add multiple objects at once using AddRange:
charts.AddRange(arrayOfCharts);
1) You can use this so-called collection initializer syntax on any object that has a public Add method.
Can you access your chart from a list/array/collection of charts like this?
for (int i = 0; i <= 19; i++) {
String chartName = "chart" + i;
Charts(chartName).Series.Clear();
}
or maybe
for (int i = 0; i <= 19; i++) {
String chartName = "chart" + i;
Charts(i).Series.Clear();
}
I have the following code snippet:
// Initialise rectangular matrix with [][] instead of [,]
double data[][] = new double[m];
for (int i = 0; i < m; i++)
data[i] = new double[n];
// Populate data[][] here...
// Code to run in parallel:
for (int i = 0; i < m; i++)
data[i] = Process(data[i]);
If this makes sense, I have a matrix of doubles. I need to apply a transformation to each individual row of the matrix. It is "embarrassingly parallel", as there is no connection for the data from one row to another.
If I do something like:
data.AsParallel().ForAll(row => { row = Process[row]; });
First of all, I don't know whether data.AsParallel() knows to only look at the first subscript, or if it will enumerate all m * n doubles. Secondly, since row is the element I'm enumerating over, I have no idea if I can change it like this - I suspect not.
So, with or without PLINQ, what is a good way to parallelise this loop in C#?
Here are two ways to do it:
data.AsParallel().ForAll(row =>
{
Process(row);
});
Parallel.For(0, data.Length, rowIndex =>
{
Process(data[rowIndex]);
});
In both cases, the one-dimensional array of doubles is passed by reference and modifying values in your Process method will modify the data array.