I'm trying to translate a bit of C++ into C# and I can't seem to do it without losing a ton of performance due to the loss of speed in looking up and accessing array elements. I'm using 3d jagged arrays because that was the most intuitive to me at the time but I'm very open to alternate suggestions. And so my question, is there a way to access some kind of collection in the same way, or a similar way, as array pointers do? Here's a bit of the C++ I was converting:
void Upsample(float* from, float* to, int n, int stride)
float* p, pCoeffs[4] = { 0.25, 0.75, 0.75, 0.25 };
p = &pCoeffs[2];
for (int i = 0; i < n; i++)
to[i * stride] = 0;
for (int k = i / 2; k <= i / 2 + 1; k++)
to[i * stride] += p[i - 2 * k] * from[Mod(k, n / 2) * stride];
for (iy=0; iy<n; iy++) for (iz=0; iz<n; iz++) {
Upsample( &noise[i], &temp1[i], n, 1 );


How to speed up nested loops in C#

This is a piece of my code, which calculate the differentiate. It works correctly but it takes a lot (because of height and width).
"Data" is a grey image bitmap.
"Filter" is [3,3] matrix.
"fh" and "fw" maximum values are 3.
I am looking to speed up this code.
I also tried with using parallel, for but it didn't work correct (error with out of bounds).
private float[,] Differentiate(int[,] Data, int[,] Filter)
int i, j, k, l, Fh, Fw;
Fw = Filter.GetLength(0);
Fh = Filter.GetLength(1);
float sum = 0;
float[,] Output = new float[Width, Height];
for (i = Fw / 2; i <= (Width - Fw / 2) - 1; i++)
for (j = Fh / 2; j <= (Height - Fh / 2) - 1; j++)
for(k = -Fw/2; k <= Fw/2; k++)
for(l = -Fh/2; l <= Fh/2; l++)
sum = sum + Data[i+k, j+l] * Filter[Fw/2+k, Fh/2+l];
Output[i,j] = sum;
return Output;
For parallel execution you need to drop c language like variable declaration at the beginning of method and declare them in actual scope that they are used so they are not shared between threads. Making it parallel should provide some benefit for performance, but making them all ParallerFors is not a good idea as there is a limit for threads amount that actually can run in parallel. I would try to make it with top level loop only:
private static float[,] Differentiate(int[,] Data, int[,] Filter)
var Fw = Filter.GetLength(0);
var Fh = Filter.GetLength(1);
float[,] Output = new float[Width, Height];
Parallel.For(Fw / 2, Width - Fw / 2 - 1, (i, state) =>
for (var j = Fh / 2; j <= (Height - Fh / 2) - 1; j++)
var sum = 0;
for (var k = -Fw / 2; k <= Fw / 2; k++)
for (var l = -Fh / 2; l <= Fh / 2; l++)
sum = sum + Data[i + k, j + l] * Filter[Fw / 2 + k, Fh / 2 + l];
Output[i, j] = sum;
return Output;
This is a perfect example of a task where using the GPU is better than using the CPU. A GPU is able to perform trillions of floating point operations per second (TFlops), while CPU performance is still measured in GFlops. The catch is that it's only any good if you use SIMD instructions (Single Instruction Multiple Data). The GPU excels at data-parallel tasks. If different data needs different instructions, using the GPU has no advantage.
In your program, the elements of your bitmap go through the same calculations: the same computations just with slightly different data (SIMD!). So using the GPU is a great option. This won't be too complex because with your calculations threads on the GPU would not need to exchange information, nor would they be dependent on results of previous iterations (Each element would be processed by a different thread on the GPU).
You can use, for example, OpenCL to easily access the GPU. More on OpenCL and using the GPU here:

C# Efficiently Linearly Interpolating through n numbers with a sample rate x

I have a jagged float array chunks (float[][] chunks), and I want to interpolate each chunk of floats (with length n) with a sample rate of x. For example, if chunks[0] was {1f, 5f, 10f} so n = 3 and lets say x = 12, I want:
I've found a way of doing this using the library MathNet.Numerics, but it is very ineffecient. My code is below.
double[] d = new double[n];
d[0] = 1d;
for (int i = 1; i < d.Length; i++)
d[i] = d[i - 1] + (double)(x - 1)/(double)(n - 1);
for (int c = 0; c < chunks.Length; c++)
for (int j = 0; j < x; j++)
doubles.Add(Convert.ToSingle(Interpolate.Linear(d, chunks[c].Select(y => Convert.ToDouble(y))).Interpolate(j + 1)));
I then parse back the List<double> into a jagged array.
This code basically mimics having a 2d plane for the interpolation, rather than just interpolating the numbers themselves (at least I think). MathNet.Numerics.Interpolate.Linear() takes two double arrays, and this was the only way I managed to get proper results. However, it takes forever. Is there a better way to do this?

Going from Parallel.ForEach to Multithreading

So I converted a recursive function to iterative and then used Parallel.ForEach but when I was running it through VTune it was only really using 2 logical cores at for the majority of its run time.
I decided to attempt to use managed threads instead, and converted this code:
for (int N = 2; N <= length; N <<= 1)
int maxThreads = 4;
var workGroup = Enumerable.Range(0, maxThreads);
Parallel.ForEach(workGroup, i =>
for (int j = ((i / maxThreads) * length); j < (((i + 1) / maxThreads) * length); j += N)
for (int k = 0; k < N / 2; k++)
int evenIndex = j + k;
int oddIndex = j + k + (N / 2);
var even = output[evenIndex];
var odd = output[oddIndex];
output[evenIndex] = even + odd * twiddles[k * (length / N)];
output[oddIndex] = even + odd * twiddles[(k + (N / 2)) * (length / N)];
Into this:
for (int N = 2; N <= length; N <<= 1)
int maxThreads = 4;
Thread one = new Thread(() => calculateChunk(0, maxThreads, length, N, output));
Thread two = new Thread(() => calculateChunk(1, maxThreads, length, N, output));
Thread three = new Thread(() => calculateChunk(2, maxThreads, length, N, output));
Thread four = new Thread(() => calculateChunk(3, maxThreads, length, N, output));
public void calculateChunk(int i, int maxThreads, int length, int N, Complex[] output)
for (int j = ((i / maxThreads) * length); j < (((i + 1) / maxThreads) * length); j += N)
for (int k = 0; k < N / 2; k++)
int evenIndex = j + k;
int oddIndex = j + k + (N / 2);
var even = output[evenIndex];
var odd = output[oddIndex];
output[evenIndex] = even + odd * twiddles[k * (length / N)];
output[oddIndex] = even + odd * twiddles[(k + (N / 2)) * (length / N)];
The issue is in the fourth thread on the last iteration of the N loop I get a index out of bounds exception for the output array where the index is attempting access the equivalent of the length.
I can not pinpoint the cause using debugging, but I believe it is to do with the threads, I ran the code without the threads and it worked as intended.
If any of the code needs changing let me know, I usually have a few people suggest edits. Thanks for your help, I have tried to sort it myself and am fairly certain the problem is occurring in my threading but I can not see how.
PS: The intended purpose is to parallelize this segment of code.
The observed behaviour is almost certainly due to the use of a captured loop iteration variable N. I can reproduce your situation with a simple test:
ConcurrentBag<int> numbers = new ConcurrentBag<int>();
for (int i = 0; i < 10000; i++)
Thread t = new Thread(() => numbers.Add(i));
//t.Join(); // Uncomment this to get expected behaviour.
// You'd not expect this assert to be true, but most of the time it will be.
Put simply, your for loop is racing to increment N before the value of N can be copied by the delegate that executes the calculateChunk call. As a result calculateChunk sees almost random values of N going up to (and including) length <<= 1 - that's what's causing your IndexOutOfRangeException.
The output values you'll get will be rubbish too as you can never rely on the value of N being correct.
If you want to safely rewrite the original code to utilize more cores, move Parallel.ForEach from the inner loop to the outer loop. If the number of outer loop iterations is high, the load balancer will be able to do its job properly (which it can't with your current workGroup count of 4 - that number of elements is simply too low).

What is the Output of a fftLeft array after applying FFTDb function to a waveLeft array C# .Frequencies, or something else?

I am a newcomer to the sound programming. I have a real-time sound visualizer( I downloaded it from
In AudioFrame.cs class there is an array as below:
_fftLeft = FourierTransform.FFTDb(ref _waveLeft);
_fftLeft is a double array. _waveLeft is also a double array. As above they applied
FouriorTransform.cs class's FFTDb function to a _waveLeft array.
Here is FFTDb function:
static public double[] FFTDb(ref double[] x)
n = x.Length;
nu = (int)(Math.Log(n) / Math.Log(2));
int n2 = n / 2;
int nu1 = nu - 1;
double[] xre = new double[n];
double[] xim = new double[n];
double[] decibel = new double[n2];
double tr, ti, p, arg, c, s;
for (int i = 0; i < n; i++)
xre[i] = x[i];
xim[i] = 0.0f;
int k = 0;
for (int l = 1; l <= nu; l++)
while (k < n)
for (int i = 1; i <= n2; i++)
p = BitReverse(k >> nu1);
arg = 2 * (double)Math.PI * p / n;
c = (double)Math.Cos(arg);
s = (double)Math.Sin(arg);
tr = xre[k + n2] * c + xim[k + n2] * s;
ti = xim[k + n2] * c - xre[k + n2] * s;
xre[k + n2] = xre[k] - tr;
xim[k + n2] = xim[k] - ti;
xre[k] += tr;
xim[k] += ti;
k += n2;
k = 0;
n2 = n2 / 2;
k = 0;
int r;
while (k < n)
r = BitReverse(k);
if (r > k)
tr = xre[k];
ti = xim[k];
xre[k] = xre[r];
xim[k] = xim[r];
xre[r] = tr;
xim[r] = ti;
for (int i = 0; i < n / 2; i++)
decibel[i] = 10.0 * Math.Log10((float)(Math.Sqrt((xre[i] * xre[i]) + (xim[i] * xim[i]))));
return decibel;
When I play a music note in a guitar i wanted to know it's frequency in a numerical format. I wrote a foreach loop to know what is the output of a _fftLeft array as below,
foreach (double myarray in _fftLeft)
Console.WriteLine(myarray );
This output's contain lots of real-time values as below .
I want to know what are those values (frequencies or not)? if the answer is frequencies then why it contains low frequency values? And when I play a guitar note I want to detect a frequency of that particular guitar note.
Based on the posted code, FFTDb first computes the FFT then computes and returns the magnitudes of the frequency spectrum in the logarithmic decibels scale. In other words, the _fftLeft then contains magnitudes for a discreet set of frequencies. The actual values of those frequencies can be computed using the array index and sampling frequency according to this answer.
As an example, if you were plotting the _fftLeft output for a pure sinusoidal tone input you should be able to see a clear spike in the index corresponding to the sinusoidal frequency. For a guitar note however you are likely going to see multiple spikes in magnitude corresponding to the harmonics. To detect the note's frequency aka pitch is a more complicated topic and typically requires the use of one of several pitch detection algorithms.

C# Exp cannot get result

When I using Math.Exp() in C# I have some questions?This code is about Kernel density estimation, and I don't have any knowledge about kernel density estimation. So I look up some wiki and some paper.
I try to write it by C#. The problem is when "distance" is getting higher the result is become 0. It's confuse me and I cannot find any other way to get the right result.
disExp = Math.Pow(Math.E, -(distance / 2 * Math.Pow(h, 2)));
So, can any one help me to get the solution? Or give me some idea about Kernel density estimation on C#. Sorry for poor English.
Try this
public static double[,] KernelDensityEstimation(double[] data, double sigma, int nsteps)
// probability density function (PDF) signal analysis
// Works like ksdensity in mathlab.
// KDE performs kernel density estimation (KDE)on one - dimensional data
// Input: -data: input data, one-dimensional
// -sigma: bandwidth(sometimes called "h")
// -nsteps: optional number of abscis points.If nsteps is an
// array, the abscis points will be taken directly from it. (default 100)
// Output: -x: equispaced abscis points
// -y: estimates of p(x)
// This function is part of the Kernel Methods Toolbox(KMBOX) for MATLAB.
// Converted to C# code by ksandric
double[,] result = new double[nsteps, 2];
double[] x = new double[nsteps], y = new double[nsteps];
double MAX = Double.MinValue, MIN = Double.MaxValue;
int N = data.Length; // number of data points
// Find MIN MAX values in data
for (int i = 0; i < N; i++)
if (MAX < data[i])
MAX = data[i];
if (MIN > data[i])
MIN = data[i];
// Like MATLAB linspace(MIN, MAX, nsteps);
x[0] = MIN;
for (int i = 1; i < nsteps; i++)
x[i] = x[i - 1] + ((MAX - MIN) / nsteps);
// kernel density estimation
double c = 1.0 / (Math.Sqrt(2 * Math.PI * sigma * sigma));
for (int i = 0; i < N; i++)
for (int j = 0; j < nsteps; j++)
y[j] = y[j] + 1.0 / N * c * Math.Exp(-(data[i] - x[j]) * (data[i] - x[j]) / (2 * sigma * sigma));
// compilation of the X,Y to result. Good for creating plot(x, y)
for (int i = 0; i < nsteps; i++)
result[i, 0] = x[i];
result[i, 1] = y[i];
return result;
kernel density estimation C#

