Make c# matrix code faster

Make c# matrix code faster - c#

Working on some matrix code, I'm concerned of performance issues.
here's how it works : I've a IMatrix abstract class (with all matrices operations etc), implemented by a ColumnMatrix class.
abstract class IMatrix
{
public int Rows {get;set;}
public int Columns {get;set;}
public abstract float At(int row, int column);
}
class ColumnMatrix : IMatrix
{
private data[];
public override float At(int row, int column)
{
return data[row + columns * this.Rows];
}
}
This class is used a lot across my application, but I'm concerned with performance issues.
Testing only read for a 2000000x15 matrix against a jagged array of the same size, I get 1359ms for array access agains 9234ms for matrix access :
public void TestAccess()
{
int iterations = 10;
int rows = 2000000;
int columns = 15;
ColumnMatrix matrix = new ColumnMatrix(rows, columns);
for (int i = 0; i < rows; i++)
for (int j = 0; j < columns; j++)
matrix[i, j] = i + j;
float[][] equivalentArray = matrix.ToRowsArray();
TimeSpan totalMatrix = new TimeSpan(0);
TimeSpan totalArray = new TimeSpan(0);
float total = 0f;
for (int iteration = 0; iteration < iterations; iteration++)
{
total = 0f;
DateTime start = DateTime.Now;
for (int i = 0; i < rows; i++)
for (int j = 0; j < columns; j++)
total = matrix.At(i, j);
totalMatrix += (DateTime.Now - start);
total += 1f; //Ensure total is read at least once.
total = total > 0 ? 0f : 0f;
start = DateTime.Now;
for (int i = 0; i < rows; i++)
for (int j = 0; j < columns; j++)
total = equivalentArray[i][j];
totalArray += (DateTime.Now - start);
}
if (total < 0f)
logger.Info("Nothing here, just make sure we read total at least once.");
logger.InfoFormat("Average time for a {0}x{1} access, matrix : {2}ms", rows, columns, totalMatrix.TotalMilliseconds);
logger.InfoFormat("Average time for a {0}x{1} access, array : {2}ms", rows, columns, totalArray.TotalMilliseconds);
Assert.IsTrue(true);
}
So my question : how can I make this thing faster ? Is there any way I can make my ColumnMatrix.At faster ?
Cheers !

Remove abstract class IMatrix. This is wrong because it's not interface and calling overridden methods is slower than calling final (aka non-modifier methods).
You could use unsafe code (pointers) to get elements of the array without array-bounds-checks (faster, but more work and unsafe)

The array code you've written can be optimized easily enough as it's clear that you're accessing memory sequentially. This means the JIT compiler will probably do a better job at converting it to native code and that will result in better performance.Another thing you're not considering is that inlining is still hit and miss so if your At method (why not using an indexer property, by the way?) is not inlined you'll suffer a huge performance hit due to the use of call and stack manipulation. Finally you should consider sealing the ColumnMatrix class because that will make the optimization much easier for the JIT compiler (call is definitely better than callvirt).

If a two-dimensional array performs so much better, which don't you use a two-dimensional array for your class's internal storage, rather than the one-dimensional one with the overhead of calculating the index?

As you are using DateTime.Now to measure the performance, the result is quite random. The resolution of the clock is something like 1/20 second, so instead of measuring the actual time, you are measuring where in the code the clock happens to tick.
You should use the Stopwatch class instead, which has much higher resolution.

For every access of an element you do a multiplication: row + columns * this.Rows.
You might see if internally you could also use a 2 dimensional array
You also gain extra overhead that the thing is abstracted away in a class. You are doing an extra method call everytime you access an element in the matrix

Change to this:
interface IMatrix
{
int Rows {get;set;}
int Columns {get;set;}
float At(int row, int column);
}
class ColumnMatrix : IMatrix
{
private data[,];
public int Rows {get;set;}
public int Columns {get;set;}
public float At(int row, int column)
{
return data[row,column];
}
}
You're better off with the interface than the abstract class - if you need common functions of it add extension methods for the interface.
Also a 2D matrix is quicker than either the jagged one or your flattened one.

You can use Parallel programming for speed up your algorithm.
You can compile this code, and compare the performance for normal matrix equations (MultiplyMatricesSequential function) and parallel matrix equations (MultiplyMatricesParallel function). You have implemented compare functions of performance of this methods (in Main function).
You can compile this code under Visual Studio 2010 (.NET 4.0)
namespace MultiplyMatrices
{
using System;
using System.Collections.Generic;
using System.Collections.Concurrent;
using System.Diagnostics;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
class Program
{
#region Sequential_Loop
static void MultiplyMatricesSequential(double[,] matA, double[,] matB,
double[,] result)
{
int matACols = matA.GetLength(1);
int matBCols = matB.GetLength(1);
int matARows = matA.GetLength(0);
for (int i = 0; i < matARows; i++)
{
for (int j = 0; j < matBCols; j++)
{
for (int k = 0; k < matACols; k++)
{
result[i, j] += matA[i, k] * matB[k, j];
}
}
}
}
#endregion
#region Parallel_Loop
static void MultiplyMatricesParallel(double[,] matA, double[,] matB, double[,] result)
{
int matACols = matA.GetLength(1);
int matBCols = matB.GetLength(1);
int matARows = matA.GetLength(0);
// A basic matrix multiplication.
// Parallelize the outer loop to partition the source array by rows.
Parallel.For(0, matARows, i =>
{
for (int j = 0; j < matBCols; j++)
{
// Use a temporary to improve parallel performance.
double temp = 0;
for (int k = 0; k < matACols; k++)
{
temp += matA[i, k] * matB[k, j];
}
result[i, j] = temp;
}
}); // Parallel.For
}
#endregion
#region Main
static void Main(string[] args)
{
// Set up matrices. Use small values to better view
// result matrix. Increase the counts to see greater
// speedup in the parallel loop vs. the sequential loop.
int colCount = 180;
int rowCount = 2000;
int colCount2 = 270;
double[,] m1 = InitializeMatrix(rowCount, colCount);
double[,] m2 = InitializeMatrix(colCount, colCount2);
double[,] result = new double[rowCount, colCount2];
// First do the sequential version.
Console.WriteLine("Executing sequential loop...");
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
MultiplyMatricesSequential(m1, m2, result);
stopwatch.Stop();
Console.WriteLine("Sequential loop time in milliseconds: {0}", stopwatch.ElapsedMilliseconds);
// For the skeptics.
OfferToPrint(rowCount, colCount2, result);
// Reset timer and results matrix.
stopwatch.Reset();
result = new double[rowCount, colCount2];
// Do the parallel loop.
Console.WriteLine("Executing parallel loop...");
stopwatch.Start();
MultiplyMatricesParallel(m1, m2, result);
stopwatch.Stop();
Console.WriteLine("Parallel loop time in milliseconds: {0}", stopwatch.ElapsedMilliseconds);
OfferToPrint(rowCount, colCount2, result);
// Keep the console window open in debug mode.
Console.WriteLine("Press any key to exit.");
Console.ReadKey();
}
#endregion
#region Helper_Methods
static double[,] InitializeMatrix(int rows, int cols)
{
double[,] matrix = new double[rows, cols];
Random r = new Random();
for (int i = 0; i < rows; i++)
{
for (int j = 0; j < cols; j++)
{
matrix[i, j] = r.Next(100);
}
}
return matrix;
}
private static void OfferToPrint(int rowCount, int colCount, double[,] matrix)
{
Console.WriteLine("Computation complete. Print results? y/n");
char c = Console.ReadKey().KeyChar;
if (c == 'y' || c == 'Y')
{
Console.WindowWidth = 180;
Console.WriteLine();
for (int x = 0; x < rowCount; x++)
{
Console.WriteLine("ROW {0}: ", x);
for (int y = 0; y < colCount; y++)
{
Console.Write("{0:#.##} ", matrix[x, y]);
}
Console.WriteLine();
}
}
}
#endregion
}
}

Related

What is the best way to process output of segmentation network in Microsoft.ML?

The network produces 1 x N x K tensor, where N is number of pixel positions and K is number of classes, each value represents score for a class at given position.
Current code to retrieve best class affinity for each position is working, but it is terribly slow and takes x4 more time, than the network run itself.
private int[,] GetClasses(List<DisposableNamedOnnxValue> output)
{
Tensor<float> outTensor = output.First().AsTensor<float>();
int[,] classes = new int[frameWidth,frameHeight];
for (int i = 0; i < frameWidth; ++i)
{
for (int j = 0; j < frameHeight; ++j)
{
int finalClass = 0;
float finalClassScore = 0;
for (int k = 0; k < nClasses; ++k)
{
float score = outTensor[0, i * frameHeight + j, k];
if (score > finalClassScore)
{
finalClassScore = score;
finalClass = k;
}
}
classes[i, j] = finalClass;
}
}
return classes;
}
Is there a better, faster way of doing this in Microsoft.ML ?

The solution I went with was to add argmax layer to the initial keras model. Keras output single value through argmax.

Stopwatch startup issues

I've got a c# program that tests a sort algorithm and it's performance by using a instance of the Stopwatch class.
So far everything is working correctly and I am getting the expected tick results except in the first run.Somehow the Stopwatch class needs about 900 ticks longer for the first calculation.
Do I have to initiate the Stopwatch class somehow different or is there any way to fix this?
static void Main() {
watch = new Stopwatch();
int amount = 10; // Amount of arrays to test
long[, ] stats= new long[3, amount]; // Array that stores ticks for every size (100,1000,10000) 'amount'-times
for (int size = 100, iteration = 0; size <= 10000; size *= 10, iteration++) {
for (int j = 0; j < amount; j++) {
stats[iteration, j] = TestSort(size); // Save ticks for random tested array in stats
}
}
PrintStats(stats);
}
public static long TestSort(int length) {
int[] testArray = GenerateRandomArray(length); // Generate a random array with size of length
watch.Reset();
watch.Start();
sort(testArray);
watch.Stop();
return watch.ElapsedTicks;
}
public static void PrintStats(long[, ] array) {
for (int i = 0; i < array.GetLength(0); i++) {
Console.Write("[");
for (int j = 0; j < array.GetLength(1); j++) {
Console.Write(array[i, j]);
if (j < array.GetLength(1) - 1) {
Console.Write(",");
}
}
Console.Write("]\n");
}
}
// Sample output
// Note that first entry is about 900 ticks longer then the other ones with size 100
[1150,256,268,262,261,262,263,261,263,262]
[19689,20550,20979,22953,19913,20578,19693,19945,19811,19970]
[1880705,1850265,3006533,1869953,1900301,1846915,1840681,1801887,1931206,2206952]

Why is my FFT Function not working well

i've made a function for FFT using AForge. It seems to work but when i'm checking with my supervisor he says the output is not correct. He is using the PWelch function from MatLab. We've already found out we were using different windows but changing them didn't make a significant difference. So again, the function does work but according to my supervisor the output is incorrect. Please Help!
This is my Function, i hope anyone sees something that's wrong because i've been looking at it for allmost two weeks now. The DATA that goes into it is already made equidistant.
private void FastFoulierMethod()
{
int NFFT = 64;
int N_OVERLAP = 32;
int numberOfEpochs = samples.Count / NFFT;
int numberOfSamplesToSelectFromFFT = NFFT-1;
double[] dataaa = new double[samples.Count];
for (int i = 0; i < samples.Count - 1; i++)
{
dataaa[i] = samples[i].GetValue();//lijst met doubles die we gebruiken
}
double[,] pFrame = new double[numberOfEpochs, numberOfSamplesToSelectFromFFT];
// The first epoch in the page starts at index 0
int beginIndexOfEpoch = 0;
for (int i = 0; i < numberOfEpochs; i++)
{
// This will get the current epoch by retrieving samples from the sample list
// starting at 'beginIndex' with length 'NFFT'. This epoch will need to be detrended next.
List<double> smapletemp = new List<double>();
for (int x = beginIndexOfEpoch; x < beginIndexOfEpoch+NFFT; x++)
{
smapletemp.Add(dataaa[x]);
}
double[] epoch = smapletemp.ToArray();
if (epoch.Length == 0)
{
break;
}
// Create array of X-axis values 1,2,3,4 ... n
// which will be used to perform linear regression.
double[] xValues = new double[epoch.Length];
for (int j = 0; j < xValues.Length; j++)
{
xValues[j] = j;
}
// Perform linear regression on the epoch. This will result in some data that is used later.
Dictionary<String, double> linearRegressionData = math.performLinearRegression(xValues.ToList(), epoch.ToList());
// Detrend the epoch
for (int j = 0; j < epoch.Length; j++)
{
double intercept = linearRegressionData["Alpha"]; // Get the intercept from the data.
double slope = linearRegressionData["Beta"]; // Get the slope from the data.
//if (1 >= math.StdDev(epoch))
//{
epoch[j] = epoch[j] - intercept - (slope * j); // Detrend the epoch by subtracting the intercept and the slope * j.
//}
}
// Create Complex from the epoch for windowing and FFT processing.
Complex[] cmplx = new Complex[epoch.Length];
for (int j = 0; j < cmplx.Length; j++)
{
cmplx[j] = new Complex(epoch[j], 0);
}
// Perform Hann window function on the Complex.
math.hann(cmplx);
// Perform Fast Fourier Transform on the Complex.
FourierTransform.FFT(cmplx, FourierTransform.Direction.Backward);
// Create an array for all powers.
double[] powers = new double[cmplx.Length];
for (int j = 0; j < epoch.Length; j++)
{
powers[j] = cmplx[j].SquaredMagnitude;
}
// Add the powers to the power frame.
for (int j = 0; j < powers.Length-1; j++)
{
pFrame[i, j] = powers[j];
}
// Shift index for the next epoch.
beginIndexOfEpoch += NFFT - N_OVERLAP;
if ( beginIndexOfEpoch + NFFT > samples.Count)
{
break;
}
}
// Create an array for the nan-mean values of all epochs.
// Nan-mean calculates the mean of a set of doubles, ignoring NaN's.
double[] nanMeanValues = new double[numberOfSamplesToSelectFromFFT];
List<double[]> Y = new List<double[]>();
for (int i = 0; i < numberOfSamplesToSelectFromFFT; i++)
{
// The sum for calculating the mean.
double sum = 0.0;
// The number of elements (doubles) for calculating the mean.
int count = 0;
// For all epochs...
for (int j = 0; j < numberOfEpochs; j++)
{
// ...the mean for all doubles at index 'i' is calculated.
double sample = pFrame[j, i];
if (!Double.IsNaN(sample))
{
// Only take the double into account when it isn't a NaN.
sum += sample;
count++;
}
}
// Actually calculate the mean and add it to the array.
nanMeanValues[i] = sum / count;
}
// We now have the mean of all power arrays (or epochs).
// Create an array with Root Mean Square values.
double[] squareRootedNanMeans = new double[nanMeanValues.Length];
for (int i = 0; i < squareRootedNanMeans.Length; i++)
{
squareRootedNanMeans[i] = Math.Sqrt(nanMeanValues[i]);
}
Y.Add(squareRootedNanMeans);

It's been ages since I studied FFT's but, unless you academic assignment is to produce an fft function, I advise you to use some library. It is great to program your own stuff to learn, but if you need results, go for the sure thing.
You can use Alglib {http://www.alglib.net/fasttransforms/fft.php}, which has a free version.
Hope this helps.

2D-array display error

How am I supposed to display my 2D array? Or the average sum of my array? It won't let me.
The code is supposed to display a 2D array, add all those numbers, calculate the average and display the average.
// It gives me a richTextBox error. I tried to change 'float' to 'void' - gives
// me a return error.
//
// It says the object require non static field?
richTextBox1.AppendText(array[ i, j] + " ");
richTextBox1.AppendText(" "+ sum.ToString());
return avg; // <<< Error here
static float Avg(int[,] array)
{
return (float)array.OfType<int>().Average();
richTextBox1.Clear(); // <<<<==================== Here
Random rand = new Random();
float sum = 0;
int rows = array.GetLength(0);
int cols = array.GetLength(1);
for (int i = 0; i < array.Length; i++)
{
for (int j = 0; j < cols; j++)
{
int value = rand.Next(-100, 100);
array[i, j] = value;
richTextBox1.AppendText(value + " "); // <<<<<<<===== Here
if (value <= 0)
sum += value;
float avg = sum / value;
}
return avg;//<<<========here
}
richTextBox1.AppendText(" Total Average is: " + avg.ToString()); // <<<==== Here
}
private void button6_Click(object sender, EventArgs e)
{
Avg(A);
}

Try this:
static float Avg(int[,] array)
{
return (float)array.OfType<int>().Average();
}
Then you use it like this:
var array = new int [2,3] {{1,2, 3}, {4,5, 6}};
Console.WriteLine(Avg(array));
Update - for jagged arrays
static float Avg(int[][] array)
{
return (float)array.SelectMany(a => a).Average();
}
void Main()
{
int[][] array =
{
new int[] {1,2,3},
new int[] {4,5}
};
Console.WriteLine(Avg(array));
}
Update 2
if you want to do it your way try this:
private void Avg(int [,] array)
{
richTextBox1.Clear();
float sum = 0;
int rows = array.GetLength(0);
int cols = array.GetLength(1);
for (int i = 0; i < rows; i++)
{
for (int j = 0; j < cols; j++)
{
richTextBox1.AppendText(array[i,j] + " ");
sum += array[i,j];
}
}
richTextBox1.AppendText(" Total Average is: " + (float)sum/(rows*cols));
}

Because we do not really know, what do you try to make with the Avg method. I just try to sumarize:
static float Avg: you cannot use static where you use Form instance members (like richTextBox1). Please remove 'static' or move usage of richTextBox1 just in the OnClick method.
Don't use return where you want. Everything after it will not process (if not using try/finally). If you want to use the returned value, call it on the last line after everything in method processed.
return avg: avg is not known in the current context, whether is defined in the previous closure (do you know what have your neighbour behind the closed doors?)
For the final solution, refer to Ned's answer.

CLR multi-dimensioned array traversal performance

In a great many places in the software I write, there are three-dimensioned arrays of short or float, usually with several million elements. The data is best understood conceptually as a three-dimensioned array, since it describes values at regular locations in space.
I saw a mention elsewhere here that the .NET CLR is not terribly "performant" when it comes to traversing those arrays, for example, when computing new values and populating a second, equally sized and dimensioned array. If this is true, why is that so?
For reasons of readability I've not settled on the idea of using jagged arrays, yet, but if that's really the answer then I'm willing, but:
To get around this it's been proposed to me that I format the data as a single dimensioned array. For example, if my array has dimensions with magnitudes m, n, and o, then I would create a float[m*n*o] instead of a float[m,n,o], and write my own indexer to get to the correct array locations during traversal.
The specific use case is in parallelizing the traversal, such as:
Parallel.For(0,m)((x) => for(y=0,y<n,y++) { for(z=0,z<o,z++) doSomething(array(x,y,z)); });
Where in the single-indexed case there would be a Parallel.ForEach(myArray, (position) => doSomething(array(Position))) kind of thing going on instead of the nested for loops.
So, the question is, really, would that be any faster than relying on the CLR array indexing that's built in?
EDIT: I've supplied my own answer below, based on some timing tests. The code is included.

One huge thing to consider is the order of traversal. Memory caching is an important part of modern processor performance and cache misses can be (relatively) expensive. If you index the array across a 'long' dimension that results in crossing cache boundaries, you may cause frequent misses as part of indexing. As such, the order in which you index is important. This often means you want to take care in how you choose to order your indices.
Also, when copying, consider that multiple indexing requires computing the 'true' index to the underlying memory block using multiplication/addition. If you're just copying all elements, though, you could simply increment a single index and access each element without additional computation required.
There are also various condition checks that occur when accessing arrays by index (making the IndexOutOfRangeException possible), which requires more checks when you access via multiple indices. I believe (though I'm not entirely sure) that the jitter can sometimes optimize single dimensional array access using a simple loop by checking the range only once, rather than on every indexing operation.

I ran some timings and found that overall performance is that it makes little or no difference:
I used this code below. The timings I got were basically identical in each case:
public partial class Form1 : Form
{
int ArrayDim1 = 50;
int ArrayDim23 = 500;
int ParallelSplit = 50;
int DoSomethingSize = 100;
Double sqRoot = 0;
Single[, ,] multidim = null;
Single[] singleDim = null;
Single[][][] jagged = null;
ParallelOptions po = new ParallelOptions() { MaxDegreeOfParallelism = 36 };
public Form1()
{
InitializeComponent();
multidim = new Single[ArrayDim1, ArrayDim23, ArrayDim23];
for (int x = 0; x < ArrayDim1; x++)
for (int y = 0; y < ArrayDim23; y++)
for (int z = 0; z < ArrayDim23; z++)
multidim[x, y, z] = 1;
singleDim = new Single[ArrayDim1 * ArrayDim23 * ArrayDim23];
for (int i = 0; i < singleDim.Length; i++)
singleDim[i] = 1;
jagged = new Single[ArrayDim1][][];
for (int i = 0; i < ArrayDim1; i++)
{
jagged[i] = new Single[ArrayDim23][];
for (int j = 0; j < ArrayDim23; j++)
{
jagged[i][j] = new Single[ArrayDim23];
}
}
}
private void btnGO_Click(object sender, EventArgs e)
{
int loopcount = 1;
DateTime startTime = DateTime.Now;
for (int i = 0; i < loopcount; i++)
{
TestMultiDimArray(multidim);
}
textBox1.Text = DateTime.Now.Subtract(startTime).TotalMilliseconds.ToString("#,###");
startTime = DateTime.Now;
for (int i = 0; i < loopcount; i++)
{
TestSingleArrayClean(singleDim);
}
textBox2.Text = DateTime.Now.Subtract(startTime).TotalMilliseconds.ToString("#,###");
startTime = DateTime.Now;
for (int i = 0; i < loopcount; i++)
{
TestJaggedArray(jagged);
}
textBox3.Text = DateTime.Now.Subtract(startTime).TotalMilliseconds.ToString("#,###");
}
public void TestJaggedArray(Single[][][] multi)
{
Parallel.For(0, ArrayDim1, po, x =>
{
for (int y = 0; y < ArrayDim23; y++)
{
for (int z = 0; z < ArrayDim23; z++)
{
DoComplex();
multi[x][y][z] = Convert.ToSingle(Math.Sqrt(123412341));
}
}
});
}
public void TestMultiDimArray(Single[, ,] multi)
{
Parallel.For(0, ArrayDim1, po, x =>
{
for (int y = 0; y < ArrayDim23; y++)
{
for (int z = 0; z < ArrayDim23; z++)
{
DoComplex();
multi[x, y, z] = Convert.ToSingle(Math.Sqrt(123412341));
}
}
});
}
public void TestSingleArrayClean(Single[] single)
{
Parallel.For(0, single.Length, po, y =>
{
//System.Diagnostics.Debug.Print(y.ToString());
DoComplex();
single[y] = Convert.ToSingle(Math.Sqrt(123412341));
});
}
public void DoComplex()
{
for (int i = 0; i < DoSomethingSize; i++)
{
sqRoot = Math.Log(101.101);
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Make c# matrix code faster - c#

Remove abstract class IMatrix. This is wrong because it's not interface and calling overridden methods is slower than calling final (aka non-modifier methods). You could use unsafe code (pointers) to get elements of the array without array-bounds-checks (faster, but more work and unsafe)

If a two-dimensional array performs so much better, which don't you use a two-dimensional array for your class's internal storage, rather than the one-dimensional one with the overhead of calculating the index?

Related

What is the best way to process output of segmentation network in Microsoft.ML?

Stopwatch startup issues

Why is my FFT Function not working well

2D-array display error

CLR multi-dimensioned array traversal performance

Categories

Resources