Sound to 3 main frequencies(low, mid, high) - c#

I have made some research but I couldn't find what I am exactly looking for. At the moment, I have to send channel values by com port.
For example:
the content of file freqs.ini
low=0-xx khz;
mid=xx-yy khz;
high=yy-zz khz;
Then I will get values by percentage like
the expecting values
lowPercent = 10;
midPercent = 77;
highPercent = 53;
So, I will be able to send these values by rs232 and my room will turn into club :) (I am using this code to illuminate LED strips). I have found some spectrum analyser projects but they all have 9 channels, that is, 3*3 combinations from low-low to high-high.
I know how to communicate with com port, but how can I get integer values of 3 frequency range I have set before?

I don't know if you still need of that but....
Do you want to know how to get a real-time spectral analsys of sound?
1.implement a queue to take a buffer of audio samples
2.take the product of buffer and a proper window function (tipically , Hamming or Hann) calculated by your program as float array
3.do FFT of yelded array: there are may algortihms out there for every language....find the best one for you, use it and take the square module from each output coefficent ( Real_part^2 + Imaginary_part^2 , if FFT returns you algebrical representation of coefficients)
sum coefficients across your bands: to know what coefficient is associated to a frequency you've just got to know that the k-th coefficient is at (SampFrequency/BufferLength)*k Hz.....so it's easy to find band boundaries
if you need to normalize in [0 , 1] interval, you have to do nothing but divide each of 3 yelded bands value for maximum value between the 3
pop your buffer queue by a Shift value that is Shift <= BufferLength and start again
the number of coefficients coming from FFT alg is equal to BufferLength (this is beacause the Discrete Fourier Transform definition) so, the frequency resolution is better when you select a long buffer, but the program goes slower. The light intensity wont' vary after BufferLength audio frames, buf after Shift audio frames.....and high ratio beetwen BufferLength gives you slowly light changes....so you must select parameters that fits your desires, remembering that you have just to turn on & off some light....make your alg fast and lo-fi!
The last thing to do is discover freqeuncy bands from your mixer's eq knobs....i don't remember if this information was on mixers handbooks

Related

How to produce a detailed spectrogram from Fourier output?

I am developing a little application in Visual Studio 2010 in C# to draw a spectrogram (frequency "heat map").
I have already done the basic things:
Cut a rectangular windowed array out of the input signal array
Feed that array into FFT, which returns complex values
Store magnitude values in an array (spectrum for that window)
Step the window, and store the new values in other arrays, resulting in a jagged array that holds every step of windowing and their spectra
Draw these into a Graphics object, in color that uses the global min/max values of the heat map as relative cold and hot
The LEFT side of the screenshot shows my application, and on the RIGHT there is a spectrogram for the same input (512 samples long) and same rectangular window with size 32 from a program called "PAST - time series analysis" (https://folk.uio.no/ohammer/past/index.html). My 512 long sample array only consists of integer elements ranging from around 100 to 1400.
(Note: the light-blue bar on the very right of the PAST spectrogram is only because I accidentally left an unnecessary '0' element at the end of thats input array. Otherwise they are the same.)
Link to screenshot: https://drive.google.com/open?id=1UbJ4GyqmS6zaHoYZCLN9c0JhWONlrbe3
But I have encountered a few problems here:
The spectrogram seems very undetailed, related to another one that I made in "PAST time series analysis" for reference, and that one looks extremely detailed. Why is that?
I know that for an e.g. 32 long time window, the FFT returns 32 elements, the 0. elem is not needed here, the next 32/2 elements have the magnitude values I need. But this means that the frequency "resolution" on the output for a 32 long window is 16. That is exactly what my program uses. But "PAST" program shows a lot more detail. If you look at the narrow lines in the blue background, you can see that they show a nice pattern in the frequency axis, but in my spectrogram that information remains unseen. Why?
In the beginning (windowSize/2) wide window step-band and the ending (windowSize/2) step-band, there are less values for FFT input, thus there is less output, or just less precision. But in the "PAST" program those parts also seem relatively detailed, not just stretched bars like in mine. How can I improve that?
The 0. element of the FFT return array (the so called "DC" element) is a huge number, which is a lot bigger than the sample average, or even its sum. Why is that?
Why are my values (e.g. the maximum that you see near the color bar) so huge? That is just a magnitude value from the FFT output. Why are there different values in the PAST program? What correction should I use on the FFT output to get those values?
Please share your ideas, if you know more about this topic. I am very new to this. I only read first about Fourier transform a little more than a week ago.
Thanks in advance!
To get more smoothness in the vertical axis, zero pad your FFT so that there are more (interpolated) frequency bins in the output. For instance, zero pad your 32 data points so that you can use a 256 point or larger FFT.
To get more smoothness in the horizontal axis, overlap your FFT input windows (75% overlap, or more).
For both, use a smooth window function (Hamming or Von Hann, et.al.), and try wider windows, more than 32 (thus even more overlapped).
To get better coloring, try using a color mapping table, with the input being the log() of the (non zero) magnitudes.
You can also use multiple different FFTs per graph XY point, and decide which to color with based on local properties.
Hello LimeAndConconut,
Even though I do not know about PAST, I can provide you with some general information about FFT. Here is an answer for each of your points
1- You are right, a FFT performed on 32 elements returns 32 frequencies (null frequency, positive and negative components). It means that you already have all the information in your data, and PAST cannot get more information with the same 32 sized window. That's why I suspect the data to be interpolated for plotting, but this just visual. Once again PAST cannot create more information than the one you have in your data.
2- Once again I agree with you. On the borders, you have access to less frequency components. You can decide different strategies: not show data at the borders, or extend this data with zero-padding or circular padding
3- The zero element of the FFT should be the sum of your 32 windowed array. You need to check FFT normalization, have a look at the documentation of your FFT function.
4- Once again check the FFT normalization. Since PAST colorbar exhibit negative values, it seems to be plotted in logarithmic scale. This is common usage to use logarithm for plotting data with high dynamics in order to enhance details.

NAudio Normalize Audio

I am trying to normalize Mp3-Files with NAudio but I don't know how to do so.
The first I did was converting the Mp3-File to PCM:
using (Mp3FileReader fr = new Mp3FileReader(mp3.getPathWithFilename())) {
using (WaveStream pcm = WaveFormatConversionStream.CreatePcmStream(fr)) {
WaveFileWriter.CreateWaveFile("test.wav", pcm);
}
}
But what is the next step? Unforntunately I didn't find anything on the net.
Thanks for your help
I'm new to NAudio, so I don't exactly know how to code this, but I do know that normalization of an audio file requires two passes through the data. The first pass is to determine the maximum and minimum data values contained in the file - so you would have to scan each data point and determine the max and min data points (and for both channels if stereo). Then, upon determining the highest max or lowest min (whichever absolute value is highest), you calculate that value as a percentage from Full Scale (the highest or lowest possible value for the bit stream, for example with 16-bit audio it's 32767 or -32768). You then increase the volume by the difference in percentage.
So for example on your scanning pass, you discovered that the highest value in a 16 bit mono file was 29000, you would then increase the volume by 112.989 percent so that the maximum sample is increased from 29000 to 32767, and all other samples are increased accordingly.

How to get peak power of sound in Unity C#

After record a sound in Unity, is it possible to get the peak power of the sound? Or is it have any way to calculate the peak power of a sound?
Peak isn't very interesting in sound. If you want something closer to perceived volume of the sound, one pretty good metric is RMS. To get this, you have to do just a bit of math:
Load the sample data using audio.GetOutputData
Sum squares of all the sampled values
Take the square root of sum / amountOfSamples - that's RMS (root-mean-square)
If you want to have a value in dB, you can get it as 20 * log10(rms / reference), where reference stands for the value you want to have at 0 dB. A good reference point is 0.1, for example. Note that the RMS value will always be from 0 to 1, while dB values are a bit wilder - they better approximate human hearing, though. If you want to be really serious, different frequencies are perceived at different volumes - have a look at dBA, for example.

Volume from byte array

I'm new to audio analysis, but need to perform a (seemingly) simple task. I have a byte array containing a 16 bit recording (single channel) and a sample rate of 44100. How do I perform a quick analysis to get the volume at any given moment? I need to calculate a threshold, so a function to return true if it's above a certain amplitude (volume) and false if not. I thought I could iterate through the byte array and check its value, with 255 being the loudest, but this doesn't seem to work as even when I don't record anything, background noise gets in and some of the array is filled with 255. Any suggestions would be great.
Thanks
As you have 16-bit data, you should expect the signal to vary between -32768 and +32767.
To calculate the volume you can take intervals of say 1000 samples, and calculate their RMS value. Sum the squared sample values divide by 1000 and take the square root. check this number against you threshold.
Typically one measures the energy of waves using root mean square.
If you want to be more perceptually accurate you can take the time-domain signal through a discrete fourier transform to a frequency-domain signal, and integrate over the magnitudes with some weighting function (since low-frequency waves are perceptually louder than high-frequency waves at the same energy).
But I don't know audio stuff either so I'm just making stuff up. ☺
I might try applying a standard-deviation sliding-window. OTOH, I would not have assumed that 255 = loudest. It may be, but I'd want to know what encoding is being used. If any compression is present, then I doubt 255 is "loudest."

How can find an peak in an array?

I am making a pitch detection program using fft. To get the pitch I need to find the lowest frequency that is significantly above the noise floor.
All the results are in an array. Each position is for a frequency. I don't have any idea how to find the peak.
I am programming in C#.
Here is a screenshot of the frequency analysis in audacity.
Instead of attempting to find the lowest peak, I would look for a fundamental frequency which maximizes the spectral energy captured by its first 5 integer multiples. Note that every peak is an integer multiple of the lowest peak. This is a hack of the cepstrum method. Don't judge :).
N.B. From your plots, I assume a 1024 sample window and 44.1kHZ sampling Rate. This yields a frequency granularity of only 44.1kHz/1024 = 43Hz. Given a 44.1kHz audio, I recommend using a longer analysis window of ~50 ms or 2048 samples. This would yield a finer frequency granularity of ~21 Hz.
Assuming a Matlab vector 'psd' of size 2048 with the PSD values.
% 50 Hz (Dude) -> 50Hz/44100Hz * 2048 -> ~2 Lower Lim
% 300 Hz (Baby) -> 300Hz/44100Hz * 2048 -> ~14 Upper Lim
lower_lim = 2;
upper_lim = 14
for fund_cand = lower_lim:1:upper_lim
i_first_five_multiples = [1:1:5]*fund_cand;
sum_energy = sum(psd(i_first_five_multiples));
end
I would find the frequency which maximizes the sum_energy value.
It would be easier if you had some notion of the absolute values to expect, but I would suggest:
find the lowest (weakest) value first. It is your noise level.
compute the average level, it is your signal strength
define some function to decide the noise threshold. This is the tricky part, it may require some experimentation.
In a bad situation, signal may be only 2 or 3 times the noise level. If the signal is better you can probably use a threshold of 2xnoise.
Edit, after looking at the picture:
You should probably just start at the left and find a local maximum. Looks like you could use 30 dB threshold and a 10-bin window or something.
Finding the lowest peak won't work reliably for estimating pitch, as this frequency is sometimes completely missing, or down in the noise floor. For better reliability, try another algorithm: autocorrelation (AMDF, ASDF lag), cepstrum (FFT log FFT), harmonic product spectrum, state space density, and variations thereof that use neural nets, genetic algorithms or decision matrices to decide between alternative pitch hypothesis (RAPT, YAAPT, et.al.).
Added:
That said, you could guess a frequency, compute the average and standard deviation of spectral magnitudes for, say, a 2-to-1 frequency range around your guess, and see if there exists a peak significantly above the average (2 sigma?). Rinse and repeat for some number of frequency guesses, and see which one, or the lowest of several, has a peak that stands out the most from the average. Use that peak.

Categories

Resources