I've also asked this here on the Sound Design forum, but the question is heavy computer science/math so it might actually belong on this forum:
So I'm able to successfully find all the information about a WAV file except the amplitude and the frequency (hertz) of the big sin function by reading the binaries in the file (which are unfortunately exactly what I'm looking for). Just to verify what I'm talking about, the file generates one wave only with the equation:
F(s) = A * sin(T * s)
Where s is the current sample, A is the amplitude and T is the period. Now the equation for the T (period) is:
T = (2π * Hz) /(α * ω)
Where Hz is frequency in Hertz, α is Samples per second, and ω is the amount of channels.
Now I know that to solve for amplitude, I could simply find the value of F(s) where
s = (π/2)/T
Because then the value of the sine function would be 1, and the final value would be equivalent to A. The problem is that to divide by T, I have to know the Hertz (or Hz).
Is there any way that I can read a WAV file to discover the Hertz from the data, assuming the file only contains a single wave.
Just to get some terms clarified, the property you're looking for is frequency, and the unit of frequency is Hertz (once per second). By convention, the typical A note has a frequency of 440 Hz.
You got the function wrong, actually. That sine wave in reality has the form F(s) = A * sin(2*pi*s/T + c) - you don't know when it started so you get a constant c in there. Also, you need to divide by T, not multiply.
Getting the amplitude is actually fairly easy. That sine wave has a series op peaks and valleys. Find each peak (higher than both neighbors) and each valley (lower), calculate the average peak and average valley, and the amplitude is TWICE the difference between the two. Pretty easy. The period T can be estimated by counting the average distance from peak to peak, and from valley to valley.
There's one bit where you need to be careful. If there is a slight bit of noise, you may get a slight dent near a peak. Instead of 14 17 18 17 14 you may get 14 17 16 17 14. That 16 isn't a valley. Once you've got a good estimate for the real peaks and valleys, throw out all the distorted peaks.
The question isn't "what frequency?". If your function is anything other than a simple trig function it'll be a combination of frequencies, each with their own amplitude.
The correct approach is digital signal processing using finite Fourier transform. You've got a lot of digging to do.
If you only want to assume a single trig function, you have just 2 (amplitude and frequency) or 3 degrees of freedom (amplitude, frequency, and phase angle) and N time points in the file. That means least squares fitting assuming a sine or cosine function.
Related
I need to create a sound analyzer to isolate certain song frequencies. For now, I'm interested in bass (60-250Hz).
I read the signal (IEEE float), for each block of 1024: do a FFT, and then extract the value corresponding to each frequency.
What I don't understand is this: I know FFT needs powers of 2 in order to work. I've seen code using blocks of 512, code using 2048, 4096 and so on.
I've settled on 1024 (which gives me roughly 47 datapoints/second). Am I correct in assuming that using, 2048, for instance will work just the same, giving me 23.5 datapoints/second, and the only difference is accuracy (and speed of computation of course)?
Also, am I required to read at 1024-boundary blocks? Like, for instance, say I simply skip the first 200 floats, will the results end up being very similar? (my tests seem to say yes)
LATER EDIT: updated title to make it easier to understand
1024/48kHz is barely longer than one period of a 60 Hz signal. Too short to determine if the signal is even fully periodic (repeats). Humans typically require somewhere around 6 periods of repetitions to hear a sound as a having a definite pitch.
60 Hz is B1. You might need 2 Hz resolution to separate B1 from C1 with a clear gap in between the two nearest FFT frequency bins. To do that, just using FFT magnitude results, would require an FFT of 48kHz/2Hz or a half second, or longer. The nearest power of 2, for 48ksps samples, is 32768.
For music pitch frequencies, there are much better pitch detector/estimators than using a bare FFT or FFT frequency peak magnitude, as they solve the missing or weak fundamental issue common in recorded instrumental or vocal music. Those pitch estimators can work with shorter time interval windows than a half second, but require more computation than a bare FFT magnitude peak picking.
After record a sound in Unity, is it possible to get the peak power of the sound? Or is it have any way to calculate the peak power of a sound?
Peak isn't very interesting in sound. If you want something closer to perceived volume of the sound, one pretty good metric is RMS. To get this, you have to do just a bit of math:
Load the sample data using audio.GetOutputData
Sum squares of all the sampled values
Take the square root of sum / amountOfSamples - that's RMS (root-mean-square)
If you want to have a value in dB, you can get it as 20 * log10(rms / reference), where reference stands for the value you want to have at 0 dB. A good reference point is 0.1, for example. Note that the RMS value will always be from 0 to 1, while dB values are a bit wilder - they better approximate human hearing, though. If you want to be really serious, different frequencies are perceived at different volumes - have a look at dBA, for example.
I have made some research but I couldn't find what I am exactly looking for. At the moment, I have to send channel values by com port.
For example:
the content of file freqs.ini
low=0-xx khz;
mid=xx-yy khz;
high=yy-zz khz;
Then I will get values by percentage like
the expecting values
lowPercent = 10;
midPercent = 77;
highPercent = 53;
So, I will be able to send these values by rs232 and my room will turn into club :) (I am using this code to illuminate LED strips). I have found some spectrum analyser projects but they all have 9 channels, that is, 3*3 combinations from low-low to high-high.
I know how to communicate with com port, but how can I get integer values of 3 frequency range I have set before?
I don't know if you still need of that but....
Do you want to know how to get a real-time spectral analsys of sound?
1.implement a queue to take a buffer of audio samples
2.take the product of buffer and a proper window function (tipically , Hamming or Hann) calculated by your program as float array
3.do FFT of yelded array: there are may algortihms out there for every language....find the best one for you, use it and take the square module from each output coefficent ( Real_part^2 + Imaginary_part^2 , if FFT returns you algebrical representation of coefficients)
sum coefficients across your bands: to know what coefficient is associated to a frequency you've just got to know that the k-th coefficient is at (SampFrequency/BufferLength)*k Hz.....so it's easy to find band boundaries
if you need to normalize in [0 , 1] interval, you have to do nothing but divide each of 3 yelded bands value for maximum value between the 3
pop your buffer queue by a Shift value that is Shift <= BufferLength and start again
the number of coefficients coming from FFT alg is equal to BufferLength (this is beacause the Discrete Fourier Transform definition) so, the frequency resolution is better when you select a long buffer, but the program goes slower. The light intensity wont' vary after BufferLength audio frames, buf after Shift audio frames.....and high ratio beetwen BufferLength gives you slowly light changes....so you must select parameters that fits your desires, remembering that you have just to turn on & off some light....make your alg fast and lo-fi!
The last thing to do is discover freqeuncy bands from your mixer's eq knobs....i don't remember if this information was on mixers handbooks
I'm new to audio analysis, but need to perform a (seemingly) simple task. I have a byte array containing a 16 bit recording (single channel) and a sample rate of 44100. How do I perform a quick analysis to get the volume at any given moment? I need to calculate a threshold, so a function to return true if it's above a certain amplitude (volume) and false if not. I thought I could iterate through the byte array and check its value, with 255 being the loudest, but this doesn't seem to work as even when I don't record anything, background noise gets in and some of the array is filled with 255. Any suggestions would be great.
Thanks
As you have 16-bit data, you should expect the signal to vary between -32768 and +32767.
To calculate the volume you can take intervals of say 1000 samples, and calculate their RMS value. Sum the squared sample values divide by 1000 and take the square root. check this number against you threshold.
Typically one measures the energy of waves using root mean square.
If you want to be more perceptually accurate you can take the time-domain signal through a discrete fourier transform to a frequency-domain signal, and integrate over the magnitudes with some weighting function (since low-frequency waves are perceptually louder than high-frequency waves at the same energy).
But I don't know audio stuff either so I'm just making stuff up. ☺
I might try applying a standard-deviation sliding-window. OTOH, I would not have assumed that 255 = loudest. It may be, but I'd want to know what encoding is being used. If any compression is present, then I doubt 255 is "loudest."
I am making a pitch detection program using fft. To get the pitch I need to find the lowest frequency that is significantly above the noise floor.
All the results are in an array. Each position is for a frequency. I don't have any idea how to find the peak.
I am programming in C#.
Here is a screenshot of the frequency analysis in audacity.
Instead of attempting to find the lowest peak, I would look for a fundamental frequency which maximizes the spectral energy captured by its first 5 integer multiples. Note that every peak is an integer multiple of the lowest peak. This is a hack of the cepstrum method. Don't judge :).
N.B. From your plots, I assume a 1024 sample window and 44.1kHZ sampling Rate. This yields a frequency granularity of only 44.1kHz/1024 = 43Hz. Given a 44.1kHz audio, I recommend using a longer analysis window of ~50 ms or 2048 samples. This would yield a finer frequency granularity of ~21 Hz.
Assuming a Matlab vector 'psd' of size 2048 with the PSD values.
% 50 Hz (Dude) -> 50Hz/44100Hz * 2048 -> ~2 Lower Lim
% 300 Hz (Baby) -> 300Hz/44100Hz * 2048 -> ~14 Upper Lim
lower_lim = 2;
upper_lim = 14
for fund_cand = lower_lim:1:upper_lim
i_first_five_multiples = [1:1:5]*fund_cand;
sum_energy = sum(psd(i_first_five_multiples));
end
I would find the frequency which maximizes the sum_energy value.
It would be easier if you had some notion of the absolute values to expect, but I would suggest:
find the lowest (weakest) value first. It is your noise level.
compute the average level, it is your signal strength
define some function to decide the noise threshold. This is the tricky part, it may require some experimentation.
In a bad situation, signal may be only 2 or 3 times the noise level. If the signal is better you can probably use a threshold of 2xnoise.
Edit, after looking at the picture:
You should probably just start at the left and find a local maximum. Looks like you could use 30 dB threshold and a 10-bin window or something.
Finding the lowest peak won't work reliably for estimating pitch, as this frequency is sometimes completely missing, or down in the noise floor. For better reliability, try another algorithm: autocorrelation (AMDF, ASDF lag), cepstrum (FFT log FFT), harmonic product spectrum, state space density, and variations thereof that use neural nets, genetic algorithms or decision matrices to decide between alternative pitch hypothesis (RAPT, YAAPT, et.al.).
Added:
That said, you could guess a frequency, compute the average and standard deviation of spectral magnitudes for, say, a 2-to-1 frequency range around your guess, and see if there exists a peak significantly above the average (2 sigma?). Rinse and repeat for some number of frequency guesses, and see which one, or the lowest of several, has a peak that stands out the most from the average. Use that peak.