I have a pcm 16 bit stream and I need to know when the audio pass a specific power.
Do I need the fft for this or I can know it in a simpler way?
Thanks
I think you need FFT.
The following methods are available in a FFT. (Usually)
/* Calculate normal power
NumSamples : Number of sample
pReal : Real coefficient buffer
pImag : Imaginary coefficient buffer
pAmpl : Working buffer to hold amplitude Xps(m) = | X(m)^2 | = Xreal(m)^2 + Ximag(m)^2</param>
*/
public static void NormalPower(UInt32 NumSamples, Double[] pReal, Double[] pImag, Double[] pAmpl)
{
// Calculate amplitude values in the buffer provided
for (UInt32 i = 0; i < NumSamples; i++)
{
pAmpl[i] = pReal[i]*pReal[i] + pImag[i]*pImag[i];
}
}
/* Find Peak frequency in Hz
NumSamples : Number of samples
pAmpl : Current amplitude
samplingRate : Sampling rate in samples/second (Hz)
index : Frequency index
<returns>Peak frequency in Hz</returns>
* */
public static Double PeakFrequency(UInt32 NumSamples, Double[] pAmpl, Double samplingRate, ref UInt32 index)
{
UInt32 N = NumSamples >> 1; // number of positive frequencies. (numSamples/2)
double maxAmpl = -1.0;
double peakFreq = -1.0;
index = 0;
for (UInt32 i = 0; i < N; i++)
{
if ( pAmpl[i] > maxAmpl )
{
maxAmpl = (double)pAmpl[i];
index = i;
peakFreq = (double)(i);
}
}
return samplingRate * peakFreq / (double)(NumSamples);
}
And you can use them like this:
FFT.Compute(_numSamples, RealIn, null, RealOut, ImagOut, false);
FFT.NormalPower(_numSamples / 2, RealOut, ImagOut, AmplOut);
double maxAmpl = (32767.0 * 32767.0); //Max power used for 16 bit audio
int centerFreq = (Rate / 2);
for (int i = 0; i < NUM_FREQUENCY; ++i)
{
if (METER_FREQUENCY[i] > centerFreq)
_meterData[i] = 0;
else
{
var indice = (int)(METER_FREQUENCY[i] * _numSamples / Rate);
var metervalue = (int)(20.0 * Math.Log10(AmplOut[indice] / maxAmpl));
_meterData[i] = metervalue;
}
}
Note: To get the sound intensity in dB the following formula is used:
level = 20 Log (P2/P1)
P1=Max Power
P2=Current Power
Related
I am about to program a visualizer with pretty good results. I have got an array with the size of 1500, with the magnitude of the frequencys in it. Now I want to convert this array in an array with 100 values. For example in the 1st index of the 2nd array should be the average of the first two values in the first array. On the 2nd index of the 2nd array should be the values of index 3-6. But i don't know how to calculate this properly. So how can I convert the first array into the second one?
I have found an answer in the rainmeter source code. Maybe it will now be clearer what I wanted to do here is the c# code:
To get an array with an specific length, log scaled with min. and max. frequencies.
private float[] getFrequencies(int min, int max, int nBands)
{
float[] returnVal = new float[nBands];
double step = (Math.Log(max / min) / nBands) / Math.Log(2.0);
returnVal[0] = (float)(min * Math.Pow(2.0, step / 2.0));
for (int iBand = 1; iBand < nBands; ++iBand)
{
returnVal[iBand] = (float)(returnVal[iBand - 1] * Math.Pow(2.0, step));
}
return returnVal;
}
And to fill the output array:
private double[] getLogArray(double[] data, int nBands, int minFreq, int maxFreq)
{
float[] bandFreq = getFrequencies(minFreq, maxFreq, nBands);
float df = (float)sampleRate / samples;
float scalar = 1.0f / sampleRate;
double[] bandOut = new double[nBands];
int iBin = 0;
int iBand = 0;
float f0 = 0.0f;
while (iBin <= (samples / 2) && iBand < nBands)
{
float fLin1 = ((float)iBin + 0.5f) * df;
float fLog1 = bandFreq[iBand];
float x = (float)data[iBin];
if (fLin1 <= fLog1)
{
bandOut[iBand] += (fLin1 - f0) * x * scalar;
f0 = fLin1;
iBin += 1;
}
else
{
bandOut[iBand] += (fLog1 - f0) * x * scalar;
f0 = fLog1;
iBand += 1;
}
}
return bandOut;
}
Have a nice day and sorry for the late response.
I'm developping a simple program that analyses frequencies of audio files.
Using an fft length of 8192, samplerate of 44100, if I use as input a constant frequency wav file - say 65Hz, 200Hz or 300Hz - the output is a constant graph at that value.
If I use a recording of someone speaking, the frequencies has peaks as high as 4000Hz, with an average at 450+ish on a 90 seconds file.
At first I thought it was because of the recording being stereo sound, but converting it to mono with the exact same bitrate as the test files doesn't change much. (average goes down from 492 to 456 but that's still way too high)
Has anyone got an idea as to what could cause this ?
I think I shouldn't find the highest value but perhaps take either an average or a median value ?
EDIT : using the average of the magnitudes per 8192 bytes buffer and getting the index that's closest to that magnitude messes everything up.
This is the code for the handler of the event the Sample Aggregator fires when it has calculated fft for current buffer
void FftCalculated(object sender, FftEventArgs e)
{
int length = e.Result.Length;
float[] magnitudes = new float[length];
for (int i = 0; i < length / 2; i++)
{
float real = e.Result[i].X;
float imaginary = e.Result[i].Y;
magnitudes[i] = (float)(10 * Math.Log10(Math.Sqrt((real * real) + (imaginary * imaginary))));
}
float max_mag = float.MinValue;
float max_index = -1;
for (int i = 0; i < length / 2; i++)
if (magnitudes[i] > max_mag)
{
max_mag = magnitudes[i];
max_index = i;
}
var currentFrequency = max_index * SAMPLERATE / 8192;
Console.WriteLine("frequency be " + currentFrequency);
}
ADDITION : this is the code that reads and sends the file to the analysing part
using (var rdr = new WaveFileReader(audioFilePath))
{
var newFormat = new WaveFormat(Convert.ToInt32(SAMPLERATE/*44100*/), 16, 1);
byte[] buffer = new byte[8192];
var audioData = new AudioData(); //custom class for project
using (var conversionStream = new WaveFormatConversionStream(newFormat, rdr))
{
// Used to send audio in realtime, it's a timestamps issue for the graphs
// I'm working on fixing this, but it has lower priority so disregard it :p
TimeSpan audioDuration = conversionStream.TotalTime;
long audioLength = conversionStream.Length;
int waitTime = (int)(audioDuration.TotalMilliseconds / audioLength * 8192);
while (conversionStream.Read(buffer, 0, buffer.Length) != 0)
{
audioData.AudioDataBase64 = Utils.Base64Encode(buffer);
Thread.Sleep(waitTime);
SendMessage("AudioData", Utils.StringToAscii(AudioData.GetJSON(audioData)));
}
Console.WriteLine("Reached End of File");
}
}
This is the code that receives the audio data
{
var audioData = new AudioData();
audioData =
AudioData.GetStateFromJSON(Utils.AsciiToString(receivedMessage));
QueueAudio(Utils.Base64Decode(audioData.AudioDataBase64)));
}
followed by
var waveFormat = new WaveFormat(Convert.ToInt32(SAMPLERATE/*44100*/), 16, 1);
_bufferedWaveProvider = new BufferedWaveProvider(waveFormat);
_bufferedWaveProvider.BufferDuration = new TimeSpan(0, 2, 0);
{
void QueueAudio(byte[] data)
{
_bufferedWaveProvider.AddSamples(data, 0, data.Length);
if (_bufferedWaveProvider.BufferedBytes >= fftLength)
{
byte[] buffer = new byte[_bufferedWaveProvider.BufferedBytes];
_bufferedWaveProvider.Read(buffer, 0, _bufferedWaveProvider.BufferedBytes);
for (int index = 0; index < buffer.Length; index += 2)
{
short sample = (short)((buffer[index] | buffer[index + 1] << 8));
float sample32 = (sample) / 32767f;
sampleAggregator.Add(sample32);
}
}
}
}
And then the SampleAggregator fires the event above when it's done with the fft.
I am a newcomer to the sound programming. I have a real-time sound visualizer(http://www.codeproject.com/Articles/20025/Sound-visualizer-in-C). I downloaded it from codeproject.com.
In AudioFrame.cs class there is an array as below:
_fftLeft = FourierTransform.FFTDb(ref _waveLeft);
_fftLeft is a double array. _waveLeft is also a double array. As above they applied
FouriorTransform.cs class's FFTDb function to a _waveLeft array.
Here is FFTDb function:
static public double[] FFTDb(ref double[] x)
{
n = x.Length;
nu = (int)(Math.Log(n) / Math.Log(2));
int n2 = n / 2;
int nu1 = nu - 1;
double[] xre = new double[n];
double[] xim = new double[n];
double[] decibel = new double[n2];
double tr, ti, p, arg, c, s;
for (int i = 0; i < n; i++)
{
xre[i] = x[i];
xim[i] = 0.0f;
}
int k = 0;
for (int l = 1; l <= nu; l++)
{
while (k < n)
{
for (int i = 1; i <= n2; i++)
{
p = BitReverse(k >> nu1);
arg = 2 * (double)Math.PI * p / n;
c = (double)Math.Cos(arg);
s = (double)Math.Sin(arg);
tr = xre[k + n2] * c + xim[k + n2] * s;
ti = xim[k + n2] * c - xre[k + n2] * s;
xre[k + n2] = xre[k] - tr;
xim[k + n2] = xim[k] - ti;
xre[k] += tr;
xim[k] += ti;
k++;
}
k += n2;
}
k = 0;
nu1--;
n2 = n2 / 2;
}
k = 0;
int r;
while (k < n)
{
r = BitReverse(k);
if (r > k)
{
tr = xre[k];
ti = xim[k];
xre[k] = xre[r];
xim[k] = xim[r];
xre[r] = tr;
xim[r] = ti;
}
k++;
}
for (int i = 0; i < n / 2; i++)
decibel[i] = 10.0 * Math.Log10((float)(Math.Sqrt((xre[i] * xre[i]) + (xim[i] * xim[i]))));
return decibel;
}
When I play a music note in a guitar i wanted to know it's frequency in a numerical format. I wrote a foreach loop to know what is the output of a _fftLeft array as below,
foreach (double myarray in _fftLeft)
{
Console.WriteLine(myarray );
}
This output's contain lots of real-time values as below .
41.3672743963389
,43.0176034462662,
35.3677383746087,
42.5968946936404,
42.0600935794783,
36.7521669642071,
41.6356709559342,
41.7189032845742,
41.1002451261724,
40.8035583510188,
45.604366914128,
39.645552593115
I want to know what are those values (frequencies or not)? if the answer is frequencies then why it contains low frequency values? And when I play a guitar note I want to detect a frequency of that particular guitar note.
Based on the posted code, FFTDb first computes the FFT then computes and returns the magnitudes of the frequency spectrum in the logarithmic decibels scale. In other words, the _fftLeft then contains magnitudes for a discreet set of frequencies. The actual values of those frequencies can be computed using the array index and sampling frequency according to this answer.
As an example, if you were plotting the _fftLeft output for a pure sinusoidal tone input you should be able to see a clear spike in the index corresponding to the sinusoidal frequency. For a guitar note however you are likely going to see multiple spikes in magnitude corresponding to the harmonics. To detect the note's frequency aka pitch is a more complicated topic and typically requires the use of one of several pitch detection algorithms.
When I using Math.Exp() in C# I have some questions?This code is about Kernel density estimation, and I don't have any knowledge about kernel density estimation. So I look up some wiki and some paper.
I try to write it by C#. The problem is when "distance" is getting higher the result is become 0. It's confuse me and I cannot find any other way to get the right result.
disExp = Math.Pow(Math.E, -(distance / 2 * Math.Pow(h, 2)));
So, can any one help me to get the solution? Or give me some idea about Kernel density estimation on C#. Sorry for poor English.
Try this
public static double[,] KernelDensityEstimation(double[] data, double sigma, int nsteps)
{
// probability density function (PDF) signal analysis
// Works like ksdensity in mathlab.
// KDE performs kernel density estimation (KDE)on one - dimensional data
// http://en.wikipedia.org/wiki/Kernel_density_estimation
// Input: -data: input data, one-dimensional
// -sigma: bandwidth(sometimes called "h")
// -nsteps: optional number of abscis points.If nsteps is an
// array, the abscis points will be taken directly from it. (default 100)
// Output: -x: equispaced abscis points
// -y: estimates of p(x)
// This function is part of the Kernel Methods Toolbox(KMBOX) for MATLAB.
// http://sourceforge.net/p/kmbox
// Converted to C# code by ksandric
double[,] result = new double[nsteps, 2];
double[] x = new double[nsteps], y = new double[nsteps];
double MAX = Double.MinValue, MIN = Double.MaxValue;
int N = data.Length; // number of data points
// Find MIN MAX values in data
for (int i = 0; i < N; i++)
{
if (MAX < data[i])
{
MAX = data[i];
}
if (MIN > data[i])
{
MIN = data[i];
}
}
// Like MATLAB linspace(MIN, MAX, nsteps);
x[0] = MIN;
for (int i = 1; i < nsteps; i++)
{
x[i] = x[i - 1] + ((MAX - MIN) / nsteps);
}
// kernel density estimation
double c = 1.0 / (Math.Sqrt(2 * Math.PI * sigma * sigma));
for (int i = 0; i < N; i++)
{
for (int j = 0; j < nsteps; j++)
{
y[j] = y[j] + 1.0 / N * c * Math.Exp(-(data[i] - x[j]) * (data[i] - x[j]) / (2 * sigma * sigma));
}
}
// compilation of the X,Y to result. Good for creating plot(x, y)
for (int i = 0; i < nsteps; i++)
{
result[i, 0] = x[i];
result[i, 1] = y[i];
}
return result;
}
kernel density estimation C#
plot
I have run this FFT algorithm on a 440Hz audio file. But I get an unexpected sound frequency: 510Hz.
Is the byteArray containing .wav correctly converted into 2 double arrays (Re & Im parts)? The imaginary array contains only 0.
I assume that the highest sound frequency is the maximum of xRe array: please look at the very end of the run() function? Maybe that is my mistake: is it average or something like that?
What is the problem then?
UPDATED: The biggest sum Re+Im is at index = 0 so I get frequency = 0;
Whole project: contains .wav -> just open and run.
using System;
using System.Net;
using System.IO;
namespace FFT {
/**
* Performs an in-place complex FFT.
*
* Released under the MIT License
*
* Copyright (c) 2010 Gerald T. Beauregard
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to
* deal in the Software without restriction, including without limitation the
* rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
* sell copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
public class FFT2 {
// Element for linked list in which we store the
// input/output data. We use a linked list because
// for sequential access it's faster than array index.
class FFTElement {
public double re = 0.0; // Real component
public double im = 0.0; // Imaginary component
public FFTElement next; // Next element in linked list
public uint revTgt; // Target position post bit-reversal
}
private static int sampleRate;
private uint m_logN = 0; // log2 of FFT size
private uint m_N = 0; // FFT size
private FFTElement[] m_X; // Vector of linked list elements
/**
*
*/
public FFT2() {
}
/**
* Initialize class to perform FFT of specified size.
*
* #param logN Log2 of FFT length. e.g. for 512 pt FFT, logN = 9.
*/
public void init(uint logN) {
m_logN = logN;
m_N = (uint)(1 << (int)m_logN);
// Allocate elements for linked list of complex numbers.
m_X = new FFTElement[m_N];
for (uint k = 0; k < m_N; k++)
m_X[k] = new FFTElement();
// Set up "next" pointers.
for (uint k = 0; k < m_N - 1; k++)
m_X[k].next = m_X[k + 1];
// Specify target for bit reversal re-ordering.
for (uint k = 0; k < m_N; k++)
m_X[k].revTgt = BitReverse(k, logN);
}
/**
* Performs in-place complex FFT.
*
* #param xRe Real part of input/output
* #param xIm Imaginary part of input/output
* #param inverse If true, do an inverse FFT
*/
public void run(double[] xRe, double[] xIm, bool inverse = false) {
uint numFlies = m_N >> 1; // Number of butterflies per sub-FFT
uint span = m_N >> 1; // Width of the butterfly
uint spacing = m_N; // Distance between start of sub-FFTs
uint wIndexStep = 1; // Increment for twiddle table index
// Copy data into linked complex number objects
// If it's an IFFT, we divide by N while we're at it
FFTElement x = m_X[0];
uint k = 0;
double scale = inverse ? 1.0 / m_N : 1.0;
while (x != null) {
x.re = scale * xRe[k];
x.im = scale * xIm[k];
x = x.next;
k++;
}
// For each stage of the FFT
for (uint stage = 0; stage < m_logN; stage++) {
// Compute a multiplier factor for the "twiddle factors".
// The twiddle factors are complex unit vectors spaced at
// regular angular intervals. The angle by which the twiddle
// factor advances depends on the FFT stage. In many FFT
// implementations the twiddle factors are cached, but because
// array lookup is relatively slow in C#, it's just
// as fast to compute them on the fly.
double wAngleInc = wIndexStep * 2.0 * Math.PI / m_N;
if (inverse == false)
wAngleInc *= -1;
double wMulRe = Math.Cos(wAngleInc);
double wMulIm = Math.Sin(wAngleInc);
for (uint start = 0; start < m_N; start += spacing) {
FFTElement xTop = m_X[start];
FFTElement xBot = m_X[start + span];
double wRe = 1.0;
double wIm = 0.0;
// For each butterfly in this stage
for (uint flyCount = 0; flyCount < numFlies; ++flyCount) {
// Get the top & bottom values
double xTopRe = xTop.re;
double xTopIm = xTop.im;
double xBotRe = xBot.re;
double xBotIm = xBot.im;
// Top branch of butterfly has addition
xTop.re = xTopRe + xBotRe;
xTop.im = xTopIm + xBotIm;
// Bottom branch of butterly has subtraction,
// followed by multiplication by twiddle factor
xBotRe = xTopRe - xBotRe;
xBotIm = xTopIm - xBotIm;
xBot.re = xBotRe * wRe - xBotIm * wIm;
xBot.im = xBotRe * wIm + xBotIm * wRe;
// Advance butterfly to next top & bottom positions
xTop = xTop.next;
xBot = xBot.next;
// Update the twiddle factor, via complex multiply
// by unit vector with the appropriate angle
// (wRe + j wIm) = (wRe + j wIm) x (wMulRe + j wMulIm)
double tRe = wRe;
wRe = wRe * wMulRe - wIm * wMulIm;
wIm = tRe * wMulIm + wIm * wMulRe;
}
}
numFlies >>= 1; // Divide by 2 by right shift
span >>= 1;
spacing >>= 1;
wIndexStep <<= 1; // Multiply by 2 by left shift
}
// The algorithm leaves the result in a scrambled order.
// Unscramble while copying values from the complex
// linked list elements back to the input/output vectors.
x = m_X[0];
while (x != null) {
uint target = x.revTgt;
xRe[target] = x.re;
xIm[target] = x.im;
x = x.next;
}
//looking for max IS THIS IS FREQUENCY
double max = 0, index = 0;
for (int i = 0; i < xRe.Length; i++) {
if (xRe[i] + xIm[i] > max) {
max = xRe[i]*xRe[i] + xIm[i]*xIm[i];
index = i;
}
}
max = Math.Sqrt(max);
/* if the peak is at bin index i then the corresponding
frequency will be i * Fs / N whe Fs is the sample rate in Hz and N is the FFT size.*/
//DONT KNOW WHY THE BIGGEST VALUE IS IN THE BEGINNING
Console.WriteLine("max "+ max+" index " + index + " m_logN" + m_logN + " " + xRe[0]);
max = index * sampleRate / m_logN;
Console.WriteLine("max " + max);
}
/**
* Do bit reversal of specified number of places of an int
* For example, 1101 bit-reversed is 1011
*
* #param x Number to be bit-reverse.
* #param numBits Number of bits in the number.
*/
private uint BitReverse(
uint x,
uint numBits) {
uint y = 0;
for (uint i = 0; i < numBits; i++) {
y <<= 1;
y |= x & 0x0001;
x >>= 1;
}
return y;
}
public static void Main(String[] args) {
// BinaryReader reader = new BinaryReader(System.IO.File.OpenRead(#"C:\Users\Duke\Desktop\e.wav"));
BinaryReader reader = new BinaryReader(File.Open(#"440.wav", FileMode.Open));
//Read the wave file header from the buffer.
int chunkID = reader.ReadInt32();
int fileSize = reader.ReadInt32();
int riffType = reader.ReadInt32();
int fmtID = reader.ReadInt32();
int fmtSize = reader.ReadInt32();
int fmtCode = reader.ReadInt16();
int channels = reader.ReadInt16();
sampleRate = reader.ReadInt32();
int fmtAvgBPS = reader.ReadInt32();
int fmtBlockAlign = reader.ReadInt16();
int bitDepth = reader.ReadInt16();
if (fmtSize == 18) {
// Read any extra values
int fmtExtraSize = reader.ReadInt16();
reader.ReadBytes(fmtExtraSize);
}
int dataID = reader.ReadInt32();
int dataSize = reader.ReadInt32();
// Store the audio data of the wave file to a byte array.
byte[] byteArray = reader.ReadBytes(dataSize);
/* for (int i = 0; i < byteArray.Length; i++) {
Console.Write(byteArray[i] + " ");
}*/
byte[] data = byteArray;
double[] arrRe = new double[data.Length];
for (int i = 0; i < arrRe.Length; i++) {
arrRe[i] = data[i] / 32768.0;
}
double[] arrI = new double[data.Length];
for (int i = 0; i < arrRe.Length; i++) {
arrI[i] = 0;
}
/**
* Initialize class to perform FFT of specified size.
*
* #param logN Log2 of FFT length. e.g. for 512 pt FFT, logN = 9.
*/
Console.WriteLine();
FFT2 fft2 = new FFT2();
uint logN = (uint)Math.Log(data.Length, 2);
fft2.init(logN);
fft2.run(arrRe, arrI);
// After this you have to split that byte array for each channel (Left,Right)
// Wav supports many channels, so you have to read channel from header
Console.ReadLine();
}
}
}
There are a few things that you need to address:
you're not applying a window function prior to the FFT - this will result in spectral leakage in the general case and you may get misleading results, particularly when looking for peaks, as there will be "smearing" of the spectrum.
when looking for peaks you should be looking at the magnitude of FFT output bins, not the individual real and imaginary parts - magnitude = sqrt(re^2 +im^2) (although you don't need to worry about the sqrt if you're just looking for peaks).
having identified a peak you need to convert the bin index into a frequency - if the peak is at bin index i then the corresponding frequency will be i * Fs / N where Fs is the sample rate in Hz and N is the FFT size.
for a real-to-complex FFT you can ignore the second N / 2 output bins as they are just the complex conjugate mirror image of the first N / 2 bins
(See also this answer for fuller explanations of the above.)