How can find an peak in an array? - c#

I am making a pitch detection program using fft. To get the pitch I need to find the lowest frequency that is significantly above the noise floor.
All the results are in an array. Each position is for a frequency. I don't have any idea how to find the peak.
I am programming in C#.
Here is a screenshot of the frequency analysis in audacity.

Instead of attempting to find the lowest peak, I would look for a fundamental frequency which maximizes the spectral energy captured by its first 5 integer multiples. Note that every peak is an integer multiple of the lowest peak. This is a hack of the cepstrum method. Don't judge :).
N.B. From your plots, I assume a 1024 sample window and 44.1kHZ sampling Rate. This yields a frequency granularity of only 44.1kHz/1024 = 43Hz. Given a 44.1kHz audio, I recommend using a longer analysis window of ~50 ms or 2048 samples. This would yield a finer frequency granularity of ~21 Hz.
Assuming a Matlab vector 'psd' of size 2048 with the PSD values.
% 50 Hz (Dude) -> 50Hz/44100Hz * 2048 -> ~2 Lower Lim
% 300 Hz (Baby) -> 300Hz/44100Hz * 2048 -> ~14 Upper Lim
lower_lim = 2;
upper_lim = 14
for fund_cand = lower_lim:1:upper_lim
i_first_five_multiples = [1:1:5]*fund_cand;
sum_energy = sum(psd(i_first_five_multiples));
end
I would find the frequency which maximizes the sum_energy value.

It would be easier if you had some notion of the absolute values to expect, but I would suggest:
find the lowest (weakest) value first. It is your noise level.
compute the average level, it is your signal strength
define some function to decide the noise threshold. This is the tricky part, it may require some experimentation.
In a bad situation, signal may be only 2 or 3 times the noise level. If the signal is better you can probably use a threshold of 2xnoise.
Edit, after looking at the picture:
You should probably just start at the left and find a local maximum. Looks like you could use 30 dB threshold and a 10-bin window or something.

Finding the lowest peak won't work reliably for estimating pitch, as this frequency is sometimes completely missing, or down in the noise floor. For better reliability, try another algorithm: autocorrelation (AMDF, ASDF lag), cepstrum (FFT log FFT), harmonic product spectrum, state space density, and variations thereof that use neural nets, genetic algorithms or decision matrices to decide between alternative pitch hypothesis (RAPT, YAAPT, et.al.).
Added:
That said, you could guess a frequency, compute the average and standard deviation of spectral magnitudes for, say, a 2-to-1 frequency range around your guess, and see if there exists a peak significantly above the average (2 sigma?). Rinse and repeat for some number of frequency guesses, and see which one, or the lowest of several, has a peak that stands out the most from the average. Use that peak.

Related

sound analyzer using naudio for 48000 samples/sec sound. Can I use a cycle-sample-size of 1024?

I need to create a sound analyzer to isolate certain song frequencies. For now, I'm interested in bass (60-250Hz).
I read the signal (IEEE float), for each block of 1024: do a FFT, and then extract the value corresponding to each frequency.
What I don't understand is this: I know FFT needs powers of 2 in order to work. I've seen code using blocks of 512, code using 2048, 4096 and so on.
I've settled on 1024 (which gives me roughly 47 datapoints/second). Am I correct in assuming that using, 2048, for instance will work just the same, giving me 23.5 datapoints/second, and the only difference is accuracy (and speed of computation of course)?
Also, am I required to read at 1024-boundary blocks? Like, for instance, say I simply skip the first 200 floats, will the results end up being very similar? (my tests seem to say yes)
LATER EDIT: updated title to make it easier to understand
1024/48kHz is barely longer than one period of a 60 Hz signal. Too short to determine if the signal is even fully periodic (repeats). Humans typically require somewhere around 6 periods of repetitions to hear a sound as a having a definite pitch.
60 Hz is B1. You might need 2 Hz resolution to separate B1 from C1 with a clear gap in between the two nearest FFT frequency bins. To do that, just using FFT magnitude results, would require an FFT of 48kHz/2Hz or a half second, or longer. The nearest power of 2, for 48ksps samples, is 32768.
For music pitch frequencies, there are much better pitch detector/estimators than using a bare FFT or FFT frequency peak magnitude, as they solve the missing or weak fundamental issue common in recorded instrumental or vocal music. Those pitch estimators can work with shorter time interval windows than a half second, but require more computation than a bare FFT magnitude peak picking.

Genetical algorithms. How to find the optimal size of the population

How to find the optimal size of the population. In my task, each gene is a value of type int lying in a given range.
For example:
The chromosome consists of 2 genes.
The first gene maybe contains a int value in the range from 5 to 15
The second gene maybe contains a int value from 15 to 25.
The question. How to find the size of the initial population.
Usually optimal size is found iteratively through trial and error. You can write a simple algorithm to optimize population size, start for example with pop size of 100 and iterativly increase it by e.g. 50. For each step you need to run GA and calculate some measure that will assess population size, you can use one of these: maximum fitness, average fitness, time till convergence criteria is met. To increase accuracy you should repeat each step at least few times, after that calculate average in each step and draw chart from which you can choose optimal pop size or if it's not enough you can optimize closely peak doing the same thing near this pop size.
Depending on your problem the chart will look different. If it's just a positive slope curve, then you will have to choose on your own reasonable pop size. With too small pop size your GA will most likely loose diversity and perhaps fall to some local optimum. When it's too big then your GA will become simple random search algorithm.
Btw I hope this example is far from your real problem, because genetic algorithms are not the best choice for such small chromosomes.

Solving for Amplitude and Frequency in WAV files

I've also asked this here on the Sound Design forum, but the question is heavy computer science/math so it might actually belong on this forum:
So I'm able to successfully find all the information about a WAV file except the amplitude and the frequency (hertz) of the big sin function by reading the binaries in the file (which are unfortunately exactly what I'm looking for). Just to verify what I'm talking about, the file generates one wave only with the equation:
F(s) = A * sin(T * s)
Where s is the current sample, A is the amplitude and T is the period. Now the equation for the T (period) is:
T = (2π * Hz) /(α * ω)
Where Hz is frequency in Hertz, α is Samples per second, and ω is the amount of channels.
Now I know that to solve for amplitude, I could simply find the value of F(s) where
s = (π/2)/T
Because then the value of the sine function would be 1, and the final value would be equivalent to A. The problem is that to divide by T, I have to know the Hertz (or Hz).
Is there any way that I can read a WAV file to discover the Hertz from the data, assuming the file only contains a single wave.
Just to get some terms clarified, the property you're looking for is frequency, and the unit of frequency is Hertz (once per second). By convention, the typical A note has a frequency of 440 Hz.
You got the function wrong, actually. That sine wave in reality has the form F(s) = A * sin(2*pi*s/T + c) - you don't know when it started so you get a constant c in there. Also, you need to divide by T, not multiply.
Getting the amplitude is actually fairly easy. That sine wave has a series op peaks and valleys. Find each peak (higher than both neighbors) and each valley (lower), calculate the average peak and average valley, and the amplitude is TWICE the difference between the two. Pretty easy. The period T can be estimated by counting the average distance from peak to peak, and from valley to valley.
There's one bit where you need to be careful. If there is a slight bit of noise, you may get a slight dent near a peak. Instead of 14 17 18 17 14 you may get 14 17 16 17 14. That 16 isn't a valley. Once you've got a good estimate for the real peaks and valleys, throw out all the distorted peaks.
The question isn't "what frequency?". If your function is anything other than a simple trig function it'll be a combination of frequencies, each with their own amplitude.
The correct approach is digital signal processing using finite Fourier transform. You've got a lot of digging to do.
If you only want to assume a single trig function, you have just 2 (amplitude and frequency) or 3 degrees of freedom (amplitude, frequency, and phase angle) and N time points in the file. That means least squares fitting assuming a sine or cosine function.

How to get peak power of sound in Unity C#

After record a sound in Unity, is it possible to get the peak power of the sound? Or is it have any way to calculate the peak power of a sound?
Peak isn't very interesting in sound. If you want something closer to perceived volume of the sound, one pretty good metric is RMS. To get this, you have to do just a bit of math:
Load the sample data using audio.GetOutputData
Sum squares of all the sampled values
Take the square root of sum / amountOfSamples - that's RMS (root-mean-square)
If you want to have a value in dB, you can get it as 20 * log10(rms / reference), where reference stands for the value you want to have at 0 dB. A good reference point is 0.1, for example. Note that the RMS value will always be from 0 to 1, while dB values are a bit wilder - they better approximate human hearing, though. If you want to be really serious, different frequencies are perceived at different volumes - have a look at dBA, for example.

Volume from byte array

I'm new to audio analysis, but need to perform a (seemingly) simple task. I have a byte array containing a 16 bit recording (single channel) and a sample rate of 44100. How do I perform a quick analysis to get the volume at any given moment? I need to calculate a threshold, so a function to return true if it's above a certain amplitude (volume) and false if not. I thought I could iterate through the byte array and check its value, with 255 being the loudest, but this doesn't seem to work as even when I don't record anything, background noise gets in and some of the array is filled with 255. Any suggestions would be great.
Thanks
As you have 16-bit data, you should expect the signal to vary between -32768 and +32767.
To calculate the volume you can take intervals of say 1000 samples, and calculate their RMS value. Sum the squared sample values divide by 1000 and take the square root. check this number against you threshold.
Typically one measures the energy of waves using root mean square.
If you want to be more perceptually accurate you can take the time-domain signal through a discrete fourier transform to a frequency-domain signal, and integrate over the magnitudes with some weighting function (since low-frequency waves are perceptually louder than high-frequency waves at the same energy).
But I don't know audio stuff either so I'm just making stuff up. ☺
I might try applying a standard-deviation sliding-window. OTOH, I would not have assumed that 255 = loudest. It may be, but I'd want to know what encoding is being used. If any compression is present, then I doubt 255 is "loudest."

Categories

Resources