CLR multi-dimensioned array traversal performance - c#

In a great many places in the software I write, there are three-dimensioned arrays of short or float, usually with several million elements. The data is best understood conceptually as a three-dimensioned array, since it describes values at regular locations in space.
I saw a mention elsewhere here that the .NET CLR is not terribly "performant" when it comes to traversing those arrays, for example, when computing new values and populating a second, equally sized and dimensioned array. If this is true, why is that so?
For reasons of readability I've not settled on the idea of using jagged arrays, yet, but if that's really the answer then I'm willing, but:
To get around this it's been proposed to me that I format the data as a single dimensioned array. For example, if my array has dimensions with magnitudes m, n, and o, then I would create a float[m*n*o] instead of a float[m,n,o], and write my own indexer to get to the correct array locations during traversal.
The specific use case is in parallelizing the traversal, such as:
Parallel.For(0,m)((x) => for(y=0,y<n,y++) { for(z=0,z<o,z++) doSomething(array(x,y,z)); });
Where in the single-indexed case there would be a Parallel.ForEach(myArray, (position) => doSomething(array(Position))) kind of thing going on instead of the nested for loops.
So, the question is, really, would that be any faster than relying on the CLR array indexing that's built in?
EDIT: I've supplied my own answer below, based on some timing tests. The code is included.

One huge thing to consider is the order of traversal. Memory caching is an important part of modern processor performance and cache misses can be (relatively) expensive. If you index the array across a 'long' dimension that results in crossing cache boundaries, you may cause frequent misses as part of indexing. As such, the order in which you index is important. This often means you want to take care in how you choose to order your indices.
Also, when copying, consider that multiple indexing requires computing the 'true' index to the underlying memory block using multiplication/addition. If you're just copying all elements, though, you could simply increment a single index and access each element without additional computation required.
There are also various condition checks that occur when accessing arrays by index (making the IndexOutOfRangeException possible), which requires more checks when you access via multiple indices. I believe (though I'm not entirely sure) that the jitter can sometimes optimize single dimensional array access using a simple loop by checking the range only once, rather than on every indexing operation.

I ran some timings and found that overall performance is that it makes little or no difference:
I used this code below. The timings I got were basically identical in each case:
public partial class Form1 : Form
{
int ArrayDim1 = 50;
int ArrayDim23 = 500;
int ParallelSplit = 50;
int DoSomethingSize = 100;
Double sqRoot = 0;
Single[, ,] multidim = null;
Single[] singleDim = null;
Single[][][] jagged = null;
ParallelOptions po = new ParallelOptions() { MaxDegreeOfParallelism = 36 };
public Form1()
{
InitializeComponent();
multidim = new Single[ArrayDim1, ArrayDim23, ArrayDim23];
for (int x = 0; x < ArrayDim1; x++)
for (int y = 0; y < ArrayDim23; y++)
for (int z = 0; z < ArrayDim23; z++)
multidim[x, y, z] = 1;
singleDim = new Single[ArrayDim1 * ArrayDim23 * ArrayDim23];
for (int i = 0; i < singleDim.Length; i++)
singleDim[i] = 1;
jagged = new Single[ArrayDim1][][];
for (int i = 0; i < ArrayDim1; i++)
{
jagged[i] = new Single[ArrayDim23][];
for (int j = 0; j < ArrayDim23; j++)
{
jagged[i][j] = new Single[ArrayDim23];
}
}
}
private void btnGO_Click(object sender, EventArgs e)
{
int loopcount = 1;
DateTime startTime = DateTime.Now;
for (int i = 0; i < loopcount; i++)
{
TestMultiDimArray(multidim);
}
textBox1.Text = DateTime.Now.Subtract(startTime).TotalMilliseconds.ToString("#,###");
startTime = DateTime.Now;
for (int i = 0; i < loopcount; i++)
{
TestSingleArrayClean(singleDim);
}
textBox2.Text = DateTime.Now.Subtract(startTime).TotalMilliseconds.ToString("#,###");
startTime = DateTime.Now;
for (int i = 0; i < loopcount; i++)
{
TestJaggedArray(jagged);
}
textBox3.Text = DateTime.Now.Subtract(startTime).TotalMilliseconds.ToString("#,###");
}
public void TestJaggedArray(Single[][][] multi)
{
Parallel.For(0, ArrayDim1, po, x =>
{
for (int y = 0; y < ArrayDim23; y++)
{
for (int z = 0; z < ArrayDim23; z++)
{
DoComplex();
multi[x][y][z] = Convert.ToSingle(Math.Sqrt(123412341));
}
}
});
}
public void TestMultiDimArray(Single[, ,] multi)
{
Parallel.For(0, ArrayDim1, po, x =>
{
for (int y = 0; y < ArrayDim23; y++)
{
for (int z = 0; z < ArrayDim23; z++)
{
DoComplex();
multi[x, y, z] = Convert.ToSingle(Math.Sqrt(123412341));
}
}
});
}
public void TestSingleArrayClean(Single[] single)
{
Parallel.For(0, single.Length, po, y =>
{
//System.Diagnostics.Debug.Print(y.ToString());
DoComplex();
single[y] = Convert.ToSingle(Math.Sqrt(123412341));
});
}
public void DoComplex()
{
for (int i = 0; i < DoSomethingSize; i++)
{
sqRoot = Math.Log(101.101);
}
}
}

Related

In C# is it faster to create a Hash Set for searching through a list, rather than searching the list itself? [duplicate]

This question already has answers here:
HashSet vs. List performance
(12 answers)
Closed 9 months ago.
I have two lists of strings, and I need to check to see if there are any matches, and I have to do this at a minimum of sixty times a second, but this can scale up to thousands of times a second.
Right now, the lists are both small; one is three, and another might have a few dozen elements at most, but the currently small list is probably gonna grow.
Would it be faster to do this:
for (int i = 0; i < listA.Length; i++)
{
for (int j = 0; j < listB.Length; j++) {
if (listA[i] == listB[j])
{
// do stuff
}
}
}
Or to do this:
var hashSetB = new HashSet<string>(listB.Length);
for (int i = 0; i < listB.Length; i++)
{
hashSetB.Add(listB[i]);
}
for (int i = 0; i < listA.Length; i++)
{
if (hashSetB.Contains(listA[i])) {
// do stuff
}
}
ListA and ListB when they come to me, will always be lists; I have no control over them.
I think the core of my question is that I don't know how long var hashSetB = new HashSet<string>(listB.Length); takes, so I'm not sure the change would be good or bad for smaller lists.
Was curious so here's some code I wrote to test it. From what I got back, HashSet was near instantaneous whereas nested loops were slow. Makes sense as you've essentially taken something where you needed to do lengthA * lengthB operations and simplified it to lengthA + lengthB operation.
const int size = 20000;
var listA = new List<int>();
for (int i = 0; i < size; i++)
{
listA.Add(i);
}
var listB = new List<int>();
for (int i = size - 5; i < 2 * size; i++)
{
listB.Add(i);
}
var sw = new Stopwatch();
sw.Start();
for (int i = 0; i < listA.Count; i++)
{
for (int j = 0; j < listB.Count; j++)
{
if (listA[i] == listB[j])
{
Console.WriteLine("Nested loop match");
}
}
}
long timeTaken1 = sw.ElapsedMilliseconds;
sw.Restart();
var hashSetB = new HashSet<int>(listB.Count);
for (int i = 0; i < listB.Count; i++)
{
hashSetB.Add(listB[i]);
}
for (int i = 0; i < listA.Count; i++)
{
if (hashSetB.Contains(listA[i]))
{
Console.WriteLine("HashSet match");
}
}
long timeTaken2 = sw.ElapsedMilliseconds;
Console.WriteLine("Time Taken Nested Loop: " + timeTaken1);
Console.WriteLine("Time Taken HashSet: " + timeTaken2);
Console.ReadLine();

Complex Array strange behaviour

In the example below, I found a strange behaviour.
I'm initializing a matrix of Complex (using System.Numerics), assigning a counter value for each position.
But at a certain moment the procedure override the previous cells.
What's wrong there is there any array limit?
it seems the Complex type the issue, with a double array this doesn't happen.
Any suggestion?
private static void Test()
{
Complex[,] m = new Complex[16400, 16400];
long count = 1;
for (int i = 0; i < 16400; i++)
{
for (int j = 0; j < 16400; j++)
{
m[i, j] = new Complex((double)count++, 0);
if (m[0,0] != 1)
Debug.Print(string.Format("({0},{1})> m[0,0] =" + m[0, 0].ToString(), i,j));
}
}
}

Why is my FFT Function not working well

i've made a function for FFT using AForge. It seems to work but when i'm checking with my supervisor he says the output is not correct. He is using the PWelch function from MatLab. We've already found out we were using different windows but changing them didn't make a significant difference. So again, the function does work but according to my supervisor the output is incorrect. Please Help!
This is my Function, i hope anyone sees something that's wrong because i've been looking at it for allmost two weeks now. The DATA that goes into it is already made equidistant.
private void FastFoulierMethod()
{
int NFFT = 64;
int N_OVERLAP = 32;
int numberOfEpochs = samples.Count / NFFT;
int numberOfSamplesToSelectFromFFT = NFFT-1;
double[] dataaa = new double[samples.Count];
for (int i = 0; i < samples.Count - 1; i++)
{
dataaa[i] = samples[i].GetValue();//lijst met doubles die we gebruiken
}
double[,] pFrame = new double[numberOfEpochs, numberOfSamplesToSelectFromFFT];
// The first epoch in the page starts at index 0
int beginIndexOfEpoch = 0;
for (int i = 0; i < numberOfEpochs; i++)
{
// This will get the current epoch by retrieving samples from the sample list
// starting at 'beginIndex' with length 'NFFT'. This epoch will need to be detrended next.
List<double> smapletemp = new List<double>();
for (int x = beginIndexOfEpoch; x < beginIndexOfEpoch+NFFT; x++)
{
smapletemp.Add(dataaa[x]);
}
double[] epoch = smapletemp.ToArray();
if (epoch.Length == 0)
{
break;
}
// Create array of X-axis values 1,2,3,4 ... n
// which will be used to perform linear regression.
double[] xValues = new double[epoch.Length];
for (int j = 0; j < xValues.Length; j++)
{
xValues[j] = j;
}
// Perform linear regression on the epoch. This will result in some data that is used later.
Dictionary<String, double> linearRegressionData = math.performLinearRegression(xValues.ToList(), epoch.ToList());
// Detrend the epoch
for (int j = 0; j < epoch.Length; j++)
{
double intercept = linearRegressionData["Alpha"]; // Get the intercept from the data.
double slope = linearRegressionData["Beta"]; // Get the slope from the data.
//if (1 >= math.StdDev(epoch))
//{
epoch[j] = epoch[j] - intercept - (slope * j); // Detrend the epoch by subtracting the intercept and the slope * j.
//}
}
// Create Complex from the epoch for windowing and FFT processing.
Complex[] cmplx = new Complex[epoch.Length];
for (int j = 0; j < cmplx.Length; j++)
{
cmplx[j] = new Complex(epoch[j], 0);
}
// Perform Hann window function on the Complex.
math.hann(cmplx);
// Perform Fast Fourier Transform on the Complex.
FourierTransform.FFT(cmplx, FourierTransform.Direction.Backward);
// Create an array for all powers.
double[] powers = new double[cmplx.Length];
for (int j = 0; j < epoch.Length; j++)
{
powers[j] = cmplx[j].SquaredMagnitude;
}
// Add the powers to the power frame.
for (int j = 0; j < powers.Length-1; j++)
{
pFrame[i, j] = powers[j];
}
// Shift index for the next epoch.
beginIndexOfEpoch += NFFT - N_OVERLAP;
if ( beginIndexOfEpoch + NFFT > samples.Count)
{
break;
}
}
// Create an array for the nan-mean values of all epochs.
// Nan-mean calculates the mean of a set of doubles, ignoring NaN's.
double[] nanMeanValues = new double[numberOfSamplesToSelectFromFFT];
List<double[]> Y = new List<double[]>();
for (int i = 0; i < numberOfSamplesToSelectFromFFT; i++)
{
// The sum for calculating the mean.
double sum = 0.0;
// The number of elements (doubles) for calculating the mean.
int count = 0;
// For all epochs...
for (int j = 0; j < numberOfEpochs; j++)
{
// ...the mean for all doubles at index 'i' is calculated.
double sample = pFrame[j, i];
if (!Double.IsNaN(sample))
{
// Only take the double into account when it isn't a NaN.
sum += sample;
count++;
}
}
// Actually calculate the mean and add it to the array.
nanMeanValues[i] = sum / count;
}
// We now have the mean of all power arrays (or epochs).
// Create an array with Root Mean Square values.
double[] squareRootedNanMeans = new double[nanMeanValues.Length];
for (int i = 0; i < squareRootedNanMeans.Length; i++)
{
squareRootedNanMeans[i] = Math.Sqrt(nanMeanValues[i]);
}
Y.Add(squareRootedNanMeans);
It's been ages since I studied FFT's but, unless you academic assignment is to produce an fft function, I advise you to use some library. It is great to program your own stuff to learn, but if you need results, go for the sure thing.
You can use Alglib {http://www.alglib.net/fasttransforms/fft.php}, which has a free version.
Hope this helps.

How do I deal with this NullReferenceException?

public partial class Form1 : Form
{
string[] id;
private void button_Click(object sender, EventArgs e)
{
char[] delimiters = { ',', '\r', '\n' };
string[] content = File.ReadAllText(CSV_File).Split(delimiters);
int x = content.GetUpperBounds(0)
int z = 0;
int i - 0;
for (i = 0; i <= x / 3; i++)
{
z = (i * 3);
id[i] = content[z]; // this line gives the error
}
}
}
I want to get every 3rd value from array content, and put it into array id. This gives a 'NullReferenceException was unhandled' error and suggests I use 'new', but it is not a type or namespace. What should I do here?
They are both string arrays, and the error occurs on the first run so I do not think it is related to exceeding the bounds.
You need to initialize id array before the for loop:
id = new string[x/3];
This line of code:
string[] id;
is actually creating a null reference.
When you declare an array, you have to explicitly create it, specifying the size.
In your example, you have two alternatives
Determine how big the array will be beforehand, and create the array length
Actually populate a container that manages its own size.
The first option:
int x = content.GetUpperBounds(0)
int z = 0;
int i - 0;
id = new string[x/3];
for (i = 0; i <= x / 3; i++)
{
z = (i * 3);
id[i] = content[x];
}
The second option:
int x = content.GetUpperBounds(0)
int z = 0;
int i - 0;
List<string> list = new List<string>();
for (i = 0; i <= x / 3; i++)
{
z = (i * 3);
list.Add(content[z]);
}
id = list.ToArray();
The first option would perform better, as you are only allocating one object.
Admittedly, I tend to disregard performance and use the second option, because it takes less brainpower to code.

Make c# matrix code faster

Working on some matrix code, I'm concerned of performance issues.
here's how it works : I've a IMatrix abstract class (with all matrices operations etc), implemented by a ColumnMatrix class.
abstract class IMatrix
{
public int Rows {get;set;}
public int Columns {get;set;}
public abstract float At(int row, int column);
}
class ColumnMatrix : IMatrix
{
private data[];
public override float At(int row, int column)
{
return data[row + columns * this.Rows];
}
}
This class is used a lot across my application, but I'm concerned with performance issues.
Testing only read for a 2000000x15 matrix against a jagged array of the same size, I get 1359ms for array access agains 9234ms for matrix access :
public void TestAccess()
{
int iterations = 10;
int rows = 2000000;
int columns = 15;
ColumnMatrix matrix = new ColumnMatrix(rows, columns);
for (int i = 0; i < rows; i++)
for (int j = 0; j < columns; j++)
matrix[i, j] = i + j;
float[][] equivalentArray = matrix.ToRowsArray();
TimeSpan totalMatrix = new TimeSpan(0);
TimeSpan totalArray = new TimeSpan(0);
float total = 0f;
for (int iteration = 0; iteration < iterations; iteration++)
{
total = 0f;
DateTime start = DateTime.Now;
for (int i = 0; i < rows; i++)
for (int j = 0; j < columns; j++)
total = matrix.At(i, j);
totalMatrix += (DateTime.Now - start);
total += 1f; //Ensure total is read at least once.
total = total > 0 ? 0f : 0f;
start = DateTime.Now;
for (int i = 0; i < rows; i++)
for (int j = 0; j < columns; j++)
total = equivalentArray[i][j];
totalArray += (DateTime.Now - start);
}
if (total < 0f)
logger.Info("Nothing here, just make sure we read total at least once.");
logger.InfoFormat("Average time for a {0}x{1} access, matrix : {2}ms", rows, columns, totalMatrix.TotalMilliseconds);
logger.InfoFormat("Average time for a {0}x{1} access, array : {2}ms", rows, columns, totalArray.TotalMilliseconds);
Assert.IsTrue(true);
}
So my question : how can I make this thing faster ? Is there any way I can make my ColumnMatrix.At faster ?
Cheers !
Remove abstract class IMatrix. This is wrong because it's not interface and calling overridden methods is slower than calling final (aka non-modifier methods).
You could use unsafe code (pointers) to get elements of the array without array-bounds-checks (faster, but more work and unsafe)
The array code you've written can be optimized easily enough as it's clear that you're accessing memory sequentially. This means the JIT compiler will probably do a better job at converting it to native code and that will result in better performance.Another thing you're not considering is that inlining is still hit and miss so if your At method (why not using an indexer property, by the way?) is not inlined you'll suffer a huge performance hit due to the use of call and stack manipulation. Finally you should consider sealing the ColumnMatrix class because that will make the optimization much easier for the JIT compiler (call is definitely better than callvirt).
If a two-dimensional array performs so much better, which don't you use a two-dimensional array for your class's internal storage, rather than the one-dimensional one with the overhead of calculating the index?
As you are using DateTime.Now to measure the performance, the result is quite random. The resolution of the clock is something like 1/20 second, so instead of measuring the actual time, you are measuring where in the code the clock happens to tick.
You should use the Stopwatch class instead, which has much higher resolution.
For every access of an element you do a multiplication: row + columns * this.Rows.
You might see if internally you could also use a 2 dimensional array
You also gain extra overhead that the thing is abstracted away in a class. You are doing an extra method call everytime you access an element in the matrix
Change to this:
interface IMatrix
{
int Rows {get;set;}
int Columns {get;set;}
float At(int row, int column);
}
class ColumnMatrix : IMatrix
{
private data[,];
public int Rows {get;set;}
public int Columns {get;set;}
public float At(int row, int column)
{
return data[row,column];
}
}
You're better off with the interface than the abstract class - if you need common functions of it add extension methods for the interface.
Also a 2D matrix is quicker than either the jagged one or your flattened one.
You can use Parallel programming for speed up your algorithm.
You can compile this code, and compare the performance for normal matrix equations (MultiplyMatricesSequential function) and parallel matrix equations (MultiplyMatricesParallel function). You have implemented compare functions of performance of this methods (in Main function).
You can compile this code under Visual Studio 2010 (.NET 4.0)
namespace MultiplyMatrices
{
using System;
using System.Collections.Generic;
using System.Collections.Concurrent;
using System.Diagnostics;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
class Program
{
#region Sequential_Loop
static void MultiplyMatricesSequential(double[,] matA, double[,] matB,
double[,] result)
{
int matACols = matA.GetLength(1);
int matBCols = matB.GetLength(1);
int matARows = matA.GetLength(0);
for (int i = 0; i < matARows; i++)
{
for (int j = 0; j < matBCols; j++)
{
for (int k = 0; k < matACols; k++)
{
result[i, j] += matA[i, k] * matB[k, j];
}
}
}
}
#endregion
#region Parallel_Loop
static void MultiplyMatricesParallel(double[,] matA, double[,] matB, double[,] result)
{
int matACols = matA.GetLength(1);
int matBCols = matB.GetLength(1);
int matARows = matA.GetLength(0);
// A basic matrix multiplication.
// Parallelize the outer loop to partition the source array by rows.
Parallel.For(0, matARows, i =>
{
for (int j = 0; j < matBCols; j++)
{
// Use a temporary to improve parallel performance.
double temp = 0;
for (int k = 0; k < matACols; k++)
{
temp += matA[i, k] * matB[k, j];
}
result[i, j] = temp;
}
}); // Parallel.For
}
#endregion
#region Main
static void Main(string[] args)
{
// Set up matrices. Use small values to better view
// result matrix. Increase the counts to see greater
// speedup in the parallel loop vs. the sequential loop.
int colCount = 180;
int rowCount = 2000;
int colCount2 = 270;
double[,] m1 = InitializeMatrix(rowCount, colCount);
double[,] m2 = InitializeMatrix(colCount, colCount2);
double[,] result = new double[rowCount, colCount2];
// First do the sequential version.
Console.WriteLine("Executing sequential loop...");
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
MultiplyMatricesSequential(m1, m2, result);
stopwatch.Stop();
Console.WriteLine("Sequential loop time in milliseconds: {0}", stopwatch.ElapsedMilliseconds);
// For the skeptics.
OfferToPrint(rowCount, colCount2, result);
// Reset timer and results matrix.
stopwatch.Reset();
result = new double[rowCount, colCount2];
// Do the parallel loop.
Console.WriteLine("Executing parallel loop...");
stopwatch.Start();
MultiplyMatricesParallel(m1, m2, result);
stopwatch.Stop();
Console.WriteLine("Parallel loop time in milliseconds: {0}", stopwatch.ElapsedMilliseconds);
OfferToPrint(rowCount, colCount2, result);
// Keep the console window open in debug mode.
Console.WriteLine("Press any key to exit.");
Console.ReadKey();
}
#endregion
#region Helper_Methods
static double[,] InitializeMatrix(int rows, int cols)
{
double[,] matrix = new double[rows, cols];
Random r = new Random();
for (int i = 0; i < rows; i++)
{
for (int j = 0; j < cols; j++)
{
matrix[i, j] = r.Next(100);
}
}
return matrix;
}
private static void OfferToPrint(int rowCount, int colCount, double[,] matrix)
{
Console.WriteLine("Computation complete. Print results? y/n");
char c = Console.ReadKey().KeyChar;
if (c == 'y' || c == 'Y')
{
Console.WindowWidth = 180;
Console.WriteLine();
for (int x = 0; x < rowCount; x++)
{
Console.WriteLine("ROW {0}: ", x);
for (int y = 0; y < colCount; y++)
{
Console.Write("{0:#.##} ", matrix[x, y]);
}
Console.WriteLine();
}
}
}
#endregion
}
}

Categories

Resources