How to get the fundamental frequency using Harmonic Product Spectrum? - c#

I'm trying to get the pitch from the microphone input. First I have decomposed the signal from time domain to frequency domain through FFT. I have applied Hamming window to the signal before performing FFT. Then I get the complex results of FFT. Then I passed the results to Harmonic product spectrum, where the results get downsampled and then multiplied the downsampled peaks and gave a value as a complex number. Then what should I do to get the fundamental frequency?
public float[] HarmonicProductSpectrum(Complex[] data)
{
Complex[] hps2 = Downsample(data, 2);
Complex[] hps3 = Downsample(data, 3);
Complex[] hps4 = Downsample(data, 4);
Complex[] hps5 = Downsample(data, 5);
float[] array = new float[hps5.Length];
for (int i = 0; i < array.Length; i++)
{
checked
{
array[i] = data[i].X * hps2[i].X * hps3[i].X * hps4[i].X * hps5[i].X;
}
}
return array;
}
public Complex[] Downsample(Complex[] data, int n)
{
Complex[] array = new Complex[Convert.ToInt32(Math.Ceiling(data.Length * 1.0 / n))];
for (int i = 0; i < array.Length; i++)
{
array[i].X = data[i * n].X;
}
return array;
}
I have tried to get the magnitude using,
magnitude[i] = (float)Math.Sqrt(array[i] * array[i] + (data[i].Y * data[i].Y));
inside the for loop in HarmonicProductSpectrum method. Then tried to get the maximum bin using,
float max_mag = float.MinValue;
float max_index = -1;
for (int i = 0; i < array.Length / 2; i++)
if (magnitude[i] > max_mag)
{
max_mag = magnitude[i];
max_index = i;
}
and then I tried to get the frequency using,
var frequency = max_index * 44100 / 1024;
But I was getting garbage values like 1248.926, 1205,859, 2454.785 for the A4 note (440 Hz) and those values don't look like harmonics of A4.
A help would be greatly appreciated.

I implemented harmonic product spectrum in Python to make sure your data and algorithm were working nicely.
Here’s what I see when applying harmonic product spectrum to the full dataset, Hamming-windowed, with 5 downsample–multiply stages:
This is just the bottom kilohertz, but the spectrum is pretty much dead above 1 KHz.
If I chunk up the long audio clip into 8192-sample chunks (with 4096-sample 50% overlap) and Hamming-window each chunk and run HPS on it, this is the matrix of HPS. This is kind of a movie of the HPS spectrum over the entire dataset. The fundamental frequency seems to be quite stable.
The full source code is here—there’s a lot of code that helps chunk the data and visualize the output of HPS running on the chunks, but the core HPS function, starting at def hps(…, is short. But it has a couple of tricks in it.
Given the strange frequencies that you’re finding the peak at, it could be that you’re operating on the full spectrum, from 0 to 44.1 KHz? You want to only keep the “positive” frequencies, i.e., from 0 to 22.05 KHz, and apply the HPS algorithm (downsample–multiply) on that.
But assuming you start out with a positive-frequency-only spectrum, take its magnitude properly, it looks like you should get reasonable results. Try to save out the output of your HarmonicProductSpectrum to see if it’s anything like the above.
Again, the full source code is at https://gist.github.com/fasiha/957035272009eb1c9eb370936a6af2eb. (There I try out another couple of spectral estimator, Welch’s method from Scipy and my port of the Blackman-Tukey spectral estimator. I’m not sure if you are set on implementing HPS or if you would consider other pitch estimators, so I’m leaving the Welch/Blackman-Tukey results there.)
Original I wrote this as a comment but had to keep revising it because it was confusing so here’s it as a mini-answer.
Based on my brief reading of this intro to HPS, I don’t think you’re taking the magnitudes correctly after you find the four decimated responses.
You want:
array[i] = sqrt(data[i] * Complex.conjugate(data[i]) *
hps2[i] * Complex.conjugate(hps2[i]) *
hps3[i] * Complex.conjugate(hps3[i]) *
hps4[i] * Complex.conjugate(hps4[i]) *
hps5[i] * Complex.conjugate(hps5[i])).X;
This uses the sqrt(x * Complex.conjugate(x)) trick to find x’s magnitude, and then multiplies all 5 magnitudes.
(Actually, it moves the sqrt outside the product, so you only do one sqrt, saves some time, but gives the same result. So maybe that’s another trick.)
Final trick: it takes that result’s real part because sometimes due to float accuracy issues, a tiny imaginary component, like 1e-15, survives.
After you do this, array should contain just real floats, and you can apply the max-bin-finding.
If there’s no Conjugate method, then the old-fashioned way should work:
public float mag2(Complex c) { return c.X * c.X + c.Y * c.Y; }
// in HarmonicProductSpectrum
array[i] = sqrt(mag2(data[i]) * mag2(hps2[i]) * mag2(hps3[i]) * mag2(hps4[i]) * mag2(hps5[i]));
There’s algebraic flaws with the two approaches you suggested in the comments below, but the above should be correct. I’m not sure what C# does when you assign a Complex to a float—maybe it uses the real component? I’d have thought that’d be a compiler error, but with the above code, you’re doing the right thing with the complex data, and only assigning a float to array[i].

To get a pitch estimate, you have to divide your sumed bin frequency estimate by the downsampling ratio used for that sum.
Added: You should also sum the magnitudes (abs()), not take the magnitude of the complex sum.
But the harmonic product spectrum algorithm (HPS), especially when using only integer ratios of downsampling, doesn't usually provide better pitch estimation resolution. Instead, it provides a more robust rough pitch estimate (less likely to be fooled by a harmonic) than using a single bare FFT magnitude peak for sequential overtone rich timbres that have weak or missing fundamental spectral content.
If you know how to downsample a spectrum by fractional ratios (using interpolation, etc.), you can try finer grained downsampling to get a better pitch estimate out of HPS. Or you can use an HPS result to inform you of a narrower frequency range in which to search using another pitch or frequency estimation method.

Related

Accord.Net Get equation of the SVM model

I have been testing the sample Kernel Support Vector Machines for regression problems and I would like to know how do you get the equation of the model.
For example, if the machine is created using a polynomial kernel (degree = 1), how do you get the line equation (mx + b) of this model. Is there any method in the SupportVectorMachine Class to get the model equation? or is there any way to calculate the parameters of the equation from the variables obtained after the machine is created.
Thanks in advance.
Looks like you can use this method below:
ToWeights(), which
Converts a Linear-kernel machine into an array of linear coefficients.
The first position in the array is the Threshold value.
So in your language, the first position in the array is the bias b and the rest are your linear coefficients m.
I got weird coefficients from ToWeights() when using SequentialMinimalOptimization() from which I couldn't derive the hyperplane equation. Using LinearCoordinateDescent() yielded usable coefficients for the model, however, in the form of [a,b,c...] which could be plugged in as 0 = a + bx + cy + ...
Hope that helps!
As #zrolfs noted, if you are using Accord.NET with Sequential Minimal Optimization, the ToWeights() function does not currently return relevant coefficients for the decision function. Nevertheless, you can calculate these coefficients directly. In order to do so, multiply the SVM weights vector by the matrix of support vectors, like so:
double[] DecisionFunctionCoefficients = new double[dwTotalFeatures];
for (int iFeature = 0; iFeature < dwTotalFeatures; iFeature++) {
for (int iVector = 0; iVector < SVM.SupportVectors.Length; iVector++) {
DecisionFunctionCoefficients[iFeature] += (SVM.SupportVectors[iVector][iFeature] * SVM.Weights[iVector]);
}
}

Resource intensive mathematical calculation method in .NET

I have a numerical computation method in my .NET code that will be called more than 1000 times.
private double CalculatePressureLossThroughPipe(double length, double flow, double diameter)
{
double costA = 0, costB = 0;
double frictionFactor = 0;
double pressure = 0;
double velocity = flow / CalculatePipeArea(diameter);
// calculate Reynods No
double reNo = ((this.mDensity * velocity * diameter) / this.mViscosity);
// calculate frictionFactor
costA = Math.Pow((2.457 * Math.Log(1 / (Math.Pow((7 / reynoldsNo), 0.9) + (0.27 * 0.000015) / diameter))), 16);
costB = Math.Pow((37530 / reNo), 16);
frictionFactor = (2 * Math.Pow(((Math.Pow((8 / reNo), 12)) + (1 / Math.Pow((costA + costB), 1.5))), 0.083333));
// Calulate Pressure
pressure = (DesignConstants.PRESSSURE_CONSTANT * 2 * frictionFactor * length * Math.Pow(velocity, 2) * this.mDensity / diameter);
return pressure;
}
This function will be called in a loop, with different set of input parameters. The loop itself is quite intensive which calls the above mentioned function (with unique parameters) every time.The function although looks small is quite resource intensive.Is there an alternate way to process the method calls without using the standard members from System.Math ?
It looks like the expression (Math.Pow((7/reynoldsNo), 0.9) + (0.27*0.000015)can be precalculated since it's not dependent on any of your inputs. In any case when you say this method 'is quite resource intensive' presumably you mean it takes a long time - have you benchmarked it ? What would an acceptable time be ? These are the things you need to find out before trying to optimise anything.
You could try to improve the performance using multiple threads (using Tasks / Threads) and vectorization.
Using System.Numerics you may be able to leverage the power of SIMD, possibly increasing performance 4 times.
First of all you should analyze all the mathematical expressions and
reduce the number of those that can be precalculated:
(0.27*0.000015)
also try to use multiplication instead of Math.Pow if possible: velocity * velocity would be faster than Math.Pow(velocity, 2)
if possible you can try Pow approximation algorithms - they are faster but not so precise. Look for more information this article: http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/
Are you using Parallel class for your loop to utilize multicore/multiprocessor of your PC? https://msdn.microsoft.com/library/dd537608(v=vs.110).aspx

Directsound logarithm volume to linear volume slider

I am developing an music player with DirectX.DirectSound. I have a problem with the volume. The directsound volume is logarithm. This means that with silent sounds, is much more sensitive to small variations in amplitude than with loud sounds. It also means that with a linear volume slider we have a logarithmic sensation of volume variations, and that just doesn't feel right. My question is: How can I make it linear?
My code until here is:
if (trkBalance.Value == trkBalance.Minimum)
{
foreGroundSound.Volume = (int)DS.Volume.Min;
}
else if (trkBalance.Value == trkBalance.Maximum)
{
foreGroundSound.Volume = (int)DS.Volume.Max;
}
else
{
foreGroundSound.Volume = (int)(-5000 * Math.Log10(100 - trkBalance.Value));
}
There is a rule of thumb to determine the perceived loudness:
A difference of 10 dB (doubleValue) results in a sound twice / half as loud as the original source.
With that in mind we can create a formula that maps the attenuation to the sound pressure level.
But at first we have to calculate the actual attenuation (as a fraction). DirectSound can attenuate a sound by 100 dB, which is an attenuation of 1/2^(100/doubleValue). This is the value for the minimum trackbar value. The maximum value is 1 (no change). So overall:
doubleValue = 10;
minimumAttenuation = 1/2^(100/doubleValue)
attenuation = minimumAttenuation + trkBalance.Value / 100 * (1 - minimumAttenuation);
Now we have a value within valid range. Now we need to find the sound pressure level for this attenuation.
And we know that the loudness doubles every 10 db (doubleValue):
attenuation = 2^(db/doubleValue) //ln
ln(attenuation) = db / doubleValue * ln(2)
db = doubleValue * ln(attenuation) / ln(2)
And since DirectSound takes hundreths dB, you can use
foreGroundSound.Volume = db * 100;
Those are just some theoretical thoughts based on wikipedia information. It might or might not work. Just try it.

Synthesizer Slide from One Frequency to Another

I'm writing a synthesizer in C# using NAudio. I'm trying to make it slide smoothly between frequencies. But I have a feeling I'm not understanding something about the math involved. It slides wildly at a high pitch before switching to the correct next pitch.
What's the mathematically correct way to slide from one pitch to another?
Here's the code:
public override int Read(float[] buffer, int offset, int sampleCount)
{
int sampleRate = WaveFormat.SampleRate;
for (int n = 0; n < sampleCount; n++)
{
if (nextFrequencyQueue.Count > 0)
{
nextFrequency = nextFrequencyQueue.Dequeue();
}
if (nextFrequency > 0 && Frequency != nextFrequency)
{
if (Frequency == 0) //special case for first note
{
Frequency = nextFrequency;
}
else //slide up or down to next frequency
{
if (Frequency < nextFrequency)
{
Frequency = Clamp(Frequency + frequencyStep, nextFrequency, Frequency);
}
if (Frequency > nextFrequency)
{
Frequency = Clamp(Frequency - frequencyStep, Frequency, nextFrequency);
}
}
}
buffer[n + offset] = (float)(Amplitude * Math.Sin(2 * Math.PI * time * Frequency));
try
{
time += (double)1 / (double)sampleRate;
}
catch
{
time = 0;
}
}
return sampleCount;
}
You are using absolute time to determine the wave function, so when you change the frequency very slightly, the next sample is what it would have been had you started the run at that new frequency.
I don't know the established best approach, but a simple approach that's probably good enough is to compute the phase (φ = t mod 1/fold) and adjust t to preserve the phase under the new frequency (t = φ/fnew).
A smoother approach would be to preserve the first derivative. This is more difficult because, unlike for the wave itself, the amplitude of the first derivative varies with frequency, which means that preserving the phase isn't sufficient. In any event, this added complexity is almost certainly overkill, given that you are varying the frequency smoothly.
One approach is to use wavetables. You construct a full cycle of a sine wave in an array, then in your Read function you can simply lookup into it. Each sample you read, you advance by an amount calculated from the desired output frequency. Then when you want to glide to a new frequency, you calculate the new delta for lookups into the table, and then instead of going straight there you adjust the delta incrementally to move to the new value over a set period of time (the 'glide' or portamento time).
Frequency = Clamp(Frequency + frequencyStep, nextFrequency, Frequency);
The human ear doesn't work like that, it is highly non-linear. Nature is logarithmic. The frequency of middle C is 261.626 Hz. The next note, C#, is related to the previous one by a factor of Math.Pow(2, 1/12.0) or about 1.0594631. So C# is 277.183 Hz, an increment of 15.557 Hz.
The next C up the scale has double the frequency, 523.252 Hz. And C# after that is 554.366 Hz, an increment of 31.084 Hz. Note how the increment doubled. So the frequencyStep in your code snippet should not be an addition, it should be a multiplication.
buffer[n + offset] = (float)(Amplitude * Math.Sin(2 * Math.PI * time * Frequency));
That's a problem as well. Your calculated samples do not smoothly transition from one frequency to the next. There's a step when "Frequency" changes. You have to apply an offset to "time" so it produces the exact same sample value at sample time "time - 1", the same value you previously calculated with the previous value of Frequency. These steps produce high frequency artifacts with many harmonics that are gratingly obvious to the human ear.
Background info is available in this Wikipedia article. It will help to visualize the wave form you generate, you would have easily diagnosed the step problem. I'll copy the Wiki image:

Discrete Fourier transform

I am currently trying to write some fourier transform algorithm. I started with a simple DFT algorithm as described in the mathematical definition:
public class DFT {
public static Complex[] Transform(Complex[] input) {
int N = input.Length;
Complex[] output = new Complex[N];
double arg = -2.0 * Math.PI / (double)N;
for (int n = 0; n < N; n++) {
output[n] = new Complex();
for (int k = 0; k < N; k++)
output[n] += input[k] * Complex.Polar(1, arg * (double)n * (double)k);
}
return output;
}
}
So I tested this algorithm with the following code:
private int samplingFrequency = 120;
private int numberValues = 240;
private void doCalc(object sender, EventArgs e) {
Complex[] input = new Complex[numberValues];
Complex[] output = new Complex[numberValues];
double t = 0;
double y = 0;
for (int i = 0; i < numberValues; i++) {
t = (double)i / (double)samplingFrequency;
y = Math.Sin(2 * Math.PI * t);
input[i] = new Complex(y, 0);
}
output = DFT.Transform(input);
printFunc(input);
printAbs(output);
}
The transformation works fine, but only if numberValues is a multiple number of the samplingFrequency (in this case: 120, 240, 360,...). Thats my result for 240 values:
The transformation just worked fine.
If i am trying to calculate 280 values I get this result:
Why I am getting a incorrect result if I change the number of my calculated values?
I am not sure if my problem here is a problem with my code or a misunderstanding of the mathematical definition of the DFT. In either way, can anybody help me with my problem? Thanks.
What you are experiencing is called Spectral Leakage.
This is caused because the underlying mathematics of the Fourier transform assumes a continuous function from -infinity to + infinity. So the range of samples you provide is effectively repeated an infinite number of times. If you don't have a complete number of cycles of the waveform in the window the ends won't line up and you will get a discontinuity which manifests its self as the frequency smearing out to either side.
The normal way to handle this is called Windowing. However, this does come with a downside as it causes the amplitudes to be slightly off. This is the process of multiply the whole window of samples you are going to process by some function which tends towards 0 at both ends of the window causing the ends to line up but with some amplitude distortion because this process lowers the total signal power.
So to summarise there is no error in your code, and the result is as expected. The artefacts can be reduced using a window function, however this will effect the accuracy of the amplitudes. You will need to investigate and determine what solution best fits the requirements of your project.
You are NOT getting the incorrect result for a non-periodic sinusoid. And they are not just "artifacts". Your result is actually the more complete DFT result which you don't see with a periodic sinusoid. Those other non-zero values contain useful information which can be used to, for example, interpolate the frequency of a single non-periodic-in-aperture sinusoid.
A DFT can be thought of as convolving a rectangular window with your sine wave. This produces (something very close to) a Sinc function, which has infinite extent, BUT just happens to be zero at every DFT bin frequency other than its central DFT bin for any sinusoid centered exactly on a DFT bin. This happens only when the frequency is exactly periodic in the FFT aperture, not for any other. The Sinc function has lots of "humps" which are all hidden in your first plot.

Categories

Resources