NAudio WAV file Frequency to Decibels - c#

I'm using NAudio to generate a WAV file. The Wav file contains environment noise (detected and recorded via mic).
I need to process this file to show average loudness (dB) against different frequency bands.
I read much about 1:3 Octave band analysis where the frequency windows are 31, 62, 125, 250, 500 Hz and so on. And we can have an average loudness against each window.
This is exactly what I want to do but HOW to achieve this seems confusing.
What I have done so far is (using NAudio tutorial) to read a WAV file and process it. Here is the code:
private void RenderFile()
{
using (WaveFileReader reader = new WaveFileReader(this.voiceRecorderState.ActiveFile))
{
this.samplesPerSecond = reader.WaveFormat.SampleRate;
SampleAggregator.NotificationCount = reader.WaveFormat.SampleRate/10;
//Sample rate is 44100
byte[] buffer = new byte[1024];
WaveBuffer waveBuffer = new WaveBuffer(buffer);
waveBuffer.ByteBufferCount = buffer.Length;
int bytesRead;
do
{
bytesRead = reader.Read(waveBuffer, 0, buffer.Length);
int samples = bytesRead / 2;
double sum = 0;
for (int sample = 0; sample < samples; sample++)
{
if (bytesRead > 0)
{
sampleAggregator.Add(waveBuffer.ShortBuffer[sample] / 32768f);
double sample1 = waveBuffer.ShortBuffer[sample] / 32768.0;
sum += (sample1 * sample1);
}
}
double rms = Math.Sqrt(sum / (SampleAggregator.NotificationCount));
double decibel = (double)(20 * Math.Log10(rms));
if (calculatedBCount == 0)
{
dBList.Add(decibel);
// System.Diagnostics.Debug.WriteLine(decibel.ToString() + " in dB");
}
} while (bytesRead > 0);
int totalSamples = (int)reader.Length / 2;
TotalWaveFormSamples = totalSamples / sampleAggregator.NotificationCount;
calculatedBCount++;
SelectAll();
//System.Diagnostics.Debug.WriteLine("Average Noise: " + avg.ToString() + " dB");
}
audioPlayer.LoadFile(this.voiceRecorderState.ActiveFile);
}
public int Read(byte[] buffer, int offset, int count)
{
if (waveBuffer == null || waveBuffer.MaxSize < count)
{
waveBuffer = new WaveBuffer(count);
}
int bytesRead = source.Read(waveBuffer, 0, count);
if (bytesRead > 0) bytesRead = count;
int frames = bytesRead / sizeof(float); // MRH: was count
float pitch = pitchDetector.DetectPitch(waveBuffer.FloatBuffer, frames);
PitchList.Add(pitch);
return frames * 4;
}
Using a 5 second WAV file, from above two methods, I get a list of Pitches and Decibels
The decibels list contains 484 values like:
-56.19945639
-55.13139952
-55.06947441
-56.70789076
-57.24140093
-55.98546603
-55.67407176
-55.53060998
-55.98480268
-54.85796943
-57.00735818
-55.64980974
-57.07235475
PitchList contains 62 values which include:
75.36621
247.631836
129.199219
75.36621
96.8994141
96.8994141
86.13281
75.36621
129.199219
107.666016
How can I use these results for identifying what is average loudness against 31Hz, 62Hz, 125Hz, 250Hz and so on.
Am I doing some wrong or everything wrong, maybe?

Please correct me if I am wrong but...I'm afraid You cannot convert HZ to DB because there is no relation between them.
Hz is a measure of frequency and Db is a measure of amplitude, sort like kilos to meters.

Related

How to find out the format of bytes? NAudio

What I'm trying to make: Receive data from microphone (IWaveIn) => lower the amplitude of the sound(lower the volume)(problem) => play it to the speakers(IWaveProvider)
The problem is: whenever I try to multiply sample by x!=1.0f gives me very noisy feedback.
I think it could be a format of bytes but I do not know how to check it. Any help/suggestion will be appreciated.
Count = 17640; Offset =0;
public int Read(byte[] buffer, int offset, int count)
{
int read = bufferedWaveProvider.Read(buffer, offset, count);
/*
waveIn.WaveFormat.Channels; //2
waveIn.WaveFormat.BlockAlign;//4
waveIn.WaveFormat.BitsPerSample;//16
waveIn.WaveFormat.SampleRate;//44100
*/
for (int i = 0; i < read / 4; i++)
{
int firstByte = i * 4;
float sample = BitConverter.ToSingle(buffer, firstByte);
sample = sample * 1.0f;
byte[] bytes = BitConverter.GetBytes(sample);
buffer[firstByte + 0] = bytes[0];
buffer[firstByte + 1] = bytes[1];
buffer[firstByte + 2] = bytes[2];
buffer[firstByte + 3] = bytes[3];
}
return read;
}
private void OnDataAvailable(object sender, WaveInEventArgs e)
{
bufferedWaveProvider.AddSamples(e.Buffer, 0, e.BytesRecorded);
}
You're receiving audio in the format specified in WaveIn.WaveFormat. Your comment shows 16 bits per sample, which means you are receiving the audio as signed 16 bit samples. So you could use BitConverter.ToInt16
But there are easier ways of accomplishing this. If you call ToSampleProvider() on your BufferedWaveProvider then you can pass that into a VolumeSampleProvider which will let you adjust the volume directly without needing to unpack samples yourself.

NAudio frequency analyser giving inconsistant results

I'm developping a simple program that analyses frequencies of audio files.
Using an fft length of 8192, samplerate of 44100, if I use as input a constant frequency wav file - say 65Hz, 200Hz or 300Hz - the output is a constant graph at that value.
If I use a recording of someone speaking, the frequencies has peaks as high as 4000Hz, with an average at 450+ish on a 90 seconds file.
At first I thought it was because of the recording being stereo sound, but converting it to mono with the exact same bitrate as the test files doesn't change much. (average goes down from 492 to 456 but that's still way too high)
Has anyone got an idea as to what could cause this ?
I think I shouldn't find the highest value but perhaps take either an average or a median value ?
EDIT : using the average of the magnitudes per 8192 bytes buffer and getting the index that's closest to that magnitude messes everything up.
This is the code for the handler of the event the Sample Aggregator fires when it has calculated fft for current buffer
void FftCalculated(object sender, FftEventArgs e)
{
int length = e.Result.Length;
float[] magnitudes = new float[length];
for (int i = 0; i < length / 2; i++)
{
float real = e.Result[i].X;
float imaginary = e.Result[i].Y;
magnitudes[i] = (float)(10 * Math.Log10(Math.Sqrt((real * real) + (imaginary * imaginary))));
}
float max_mag = float.MinValue;
float max_index = -1;
for (int i = 0; i < length / 2; i++)
if (magnitudes[i] > max_mag)
{
max_mag = magnitudes[i];
max_index = i;
}
var currentFrequency = max_index * SAMPLERATE / 8192;
Console.WriteLine("frequency be " + currentFrequency);
}
ADDITION : this is the code that reads and sends the file to the analysing part
using (var rdr = new WaveFileReader(audioFilePath))
{
var newFormat = new WaveFormat(Convert.ToInt32(SAMPLERATE/*44100*/), 16, 1);
byte[] buffer = new byte[8192];
var audioData = new AudioData(); //custom class for project
using (var conversionStream = new WaveFormatConversionStream(newFormat, rdr))
{
// Used to send audio in realtime, it's a timestamps issue for the graphs
// I'm working on fixing this, but it has lower priority so disregard it :p
TimeSpan audioDuration = conversionStream.TotalTime;
long audioLength = conversionStream.Length;
int waitTime = (int)(audioDuration.TotalMilliseconds / audioLength * 8192);
while (conversionStream.Read(buffer, 0, buffer.Length) != 0)
{
audioData.AudioDataBase64 = Utils.Base64Encode(buffer);
Thread.Sleep(waitTime);
SendMessage("AudioData", Utils.StringToAscii(AudioData.GetJSON(audioData)));
}
Console.WriteLine("Reached End of File");
}
}
This is the code that receives the audio data
{
var audioData = new AudioData();
audioData =
AudioData.GetStateFromJSON(Utils.AsciiToString(receivedMessage));
QueueAudio(Utils.Base64Decode(audioData.AudioDataBase64)));
}
followed by
var waveFormat = new WaveFormat(Convert.ToInt32(SAMPLERATE/*44100*/), 16, 1);
_bufferedWaveProvider = new BufferedWaveProvider(waveFormat);
_bufferedWaveProvider.BufferDuration = new TimeSpan(0, 2, 0);
{
void QueueAudio(byte[] data)
{
_bufferedWaveProvider.AddSamples(data, 0, data.Length);
if (_bufferedWaveProvider.BufferedBytes >= fftLength)
{
byte[] buffer = new byte[_bufferedWaveProvider.BufferedBytes];
_bufferedWaveProvider.Read(buffer, 0, _bufferedWaveProvider.BufferedBytes);
for (int index = 0; index < buffer.Length; index += 2)
{
short sample = (short)((buffer[index] | buffer[index + 1] << 8));
float sample32 = (sample) / 32767f;
sampleAggregator.Add(sample32);
}
}
}
}
And then the SampleAggregator fires the event above when it's done with the fft.

What am I doing wrong when parsing a wav file?

I'm trying to parse a wav file. I'm not sure if there can be multiple data chunks in a wav file, but I originally assumed there was only 1 since the wav file format description I was reading only mentioned there being 1.
But I noticed that the subchunk2size was very small (like 26) when the wav file being parsed was something like 36MB and the sample rate was 44100.
So I tried to parse it assuming there were multiple chunks, but after the 1st chunk, there was no subchunk2id to be found.
To go chunk by chunk, I was using the below code
int chunkSize = System.BitConverter.ToInt32(strm, 40);
int widx = 44; //wav data starts at the 44th byte
//strm is a byte array of the wav file
while(widx < strm.Length)
{
widx += chunkSize;
if(widx < 1000)
{
//log "data" or "100 97 116 97" for the subchunkid
//This is only getting printed the 1st time though. All prints after that are garbage
Debug.Log( strm[widx] + " " + strm[widx+1] + " " + strm[widx+2] + " " + strm[widx+3]);
}
if(widx + 8 < strm.Length)
{
widx += 4;
chunkSize = System.BitConverter.ToInt32(strm, widx);
widx += 4;
}else
{
widx += 8;
}
}
A .wav-File has 3 chunks:
Each chunk has a size of 4 Byte
The first chunk is the "RIFF"-chunk. It includes 8 Byte the filesize(4 Byte) and the name of the format(4byte, usually "WAVE").
The next chunk is the "fmt "-chunk (the space in the chunk-name is important). It includes the audio-format(2 Byte), the number of channels (2 Byte), the sample rate (4 Byte), the byte rate (4 Byte), blockalign (2 Byte) and the bits per sample (2 Byte).
The third and last chunk is the data-chunk. Here are the real data and the amplitudes of the samples. It includes 4 Byte for the datasize, which is the number of bytes for the data.
You can find further explanations of the properties of a .wav-file here.
From this knowledge I have already created the following class:
public sealed class WaveFile
{
//privates
private int fileSize;
private string format;
private int fmtChunkSize;
private int audioFormat;
private int numChannels;
private int sampleRate;
private int byteRate;
private int blockAlign;
private int bitsPerSample;
private int dataSize;
private int[][] data;//One array per channel
//publics
public int FileSize => fileSize;
public string Format => format;
public int FmtChunkSize => fmtChunkSize;
public int AudioFormat => audioFormat;
public int NumChannels => numChannels;
public int SampleRate => sampleRate;
public int ByteRate => byteRate;
public int BitsPerSample => bitsPerSample;
public int DataSize => dataSize;
public int[][] Data => data;
public WaveFile(string path)
{
FileStream fs = File.OpenRead(path);
LoadChunk(fs); //read RIFF Chunk
LoadChunk(fs); //read fmt Chunk
LoadChunk(fs); //read data Chunk
fs.Close();
}
private void LoadChunk(FileStream fs)
{
ASCIIEncoding Encoder = new ASCIIEncoding();
byte[] bChunkID = new byte[4];
fs.Read(bChunkID, 0, 4);
string sChunkID = Encoder.GetString(bChunkID);
byte[] ChunkSize = new byte[4];
fs.Read(ChunkSize, 0, 4);
if (sChunkID.Equals("RIFF"))
{
fileSize = BitConverter.ToInt32(ChunkSize, 0);
byte[] Format = new byte[4];
fs.Read(Format, 0, 4);
this.format = Encoder.GetString(Format);
}
if (sChunkID.Equals("fmt "))
{
fmtChunkSize = BitConverter.ToInt32(ChunkSize, 0);
byte[] audioFormat = new byte[2];
fs.Read(audioFormat, 0, 2);
this.audioFormat = BitConverter.ToInt16(audioFormat, 0);
byte[] numChannels = new byte[2];
fs.Read(numChannels, 0, 2);
this.numChannels = BitConverter.ToInt16(numChannels, 0);
byte[] sampleRate = new byte[4];
fs.Read(sampleRate, 0, 4);
this.sampleRate = BitConverter.ToInt32(sampleRate, 0);
byte[] byteRate = new byte[4];
fs.Read(byteRate, 0, 4);
this.byteRate = BitConverter.ToInt32(byteRate, 0);
byte[] blockAlign = new byte[2];
fs.Read(blockAlign, 0, 2);
this.blockAlign = BitConverter.ToInt16(blockAlign, 0);
byte[] bitsPerSample = new byte[2];
fs.Read(bitsPerSample, 0, 2);
this.bitsPerSample = BitConverter.ToInt16(bitsPerSample, 0);
}
if (sChunkID.Equals("data"))
{
dataSize = BitConverter.ToInt32(ChunkSize, 0);
data = new int[this.numChannels][];
byte[] temp = new byte[dataSize];
for (int i = 0; i < this.numChannels; i++)
{
data[i] = new int[this.dataSize / (numChannels * bitsPerSample / 8)];
}
for (int i = 0; i < data[0].Length; i++)
{
for (int j = 0; j < numChannels; j++)
{
if (fs.Read(temp, 0, blockAlign / numChannels) > 0)
{
if (blockAlign / numChannels == 2)
{ data[j][i] = BitConverter.ToInt32(temp, 0); }
else
{ data[j][i] = BitConverter.ToInt16(temp, 0); }
}
}
}
}
}
}
Needed using-directives:
using System;
using System.IO;
using System.Text;
This class reads all chunks byte per byte and sets the properties. You just have to initialize this class and it will return all properties of your selected wave-file.
In the reference you added I dont see any mention of the chunk size being repeated for each data chunk...
Try something like this:
int chunkSize = System.BitConverter.ToInt32(strm, 40);
int widx = 44; //wav data starts at the 44th byte
//strm is a byte array of the wav file
while(widx < strm.Length)
{
if(widx < 1000)
{
//log "data" or "100 97 116 97" for the subchunkid
//This is only getting printed the 1st time though. All prints after that are garbage
Debug.Log( strm[widx] + " " + strm[widx+1] + " " + strm[widx+2] + " " + strm[widx+3]);
}
widx += chunkSize;
}

Splitting audio into left and right channel into seperate files

I'm having trouble seperating the channel buffers into a new file.
Here is the code for extracting each channels buffer:
int samplesDesired = 10000;
byte[] buffer = new byte[samplesDesired * 4];
short[] left = new short[samplesDesired];
short[] right = new short[samplesDesired];
using (WaveFileReader pcm = new WaveFileReader(filePath))
{
int bytesRead = pcm.Read(buffer, 0, 10000);
int index = 0;
for (int i = 0; i < bytesRead / 4; i++)
{
left[i] = BitConverter.ToInt16(buffer, index);
index += 2;
right[i] = BitConverter.ToInt16(buffer, index);
index += 2;
}
}
And here is how I try to create a file from the gathered buffers:
using(var leftChannelFile = new WaveFileWriter("test.wav", new WaveFormat()))
{
leftChannelFile.WriteSamples(left, 0, left.Length);
}
The problem is, when I try to play the "file.wav", it is 0 seconds long and 19,5 KB large. Any idea on why is that happening?
Either you set the header of the resulting files indicating it is one channel/mono,
OR
you add a second empty channel to your result files (all 0 Bytes)

FFT which frequencies are in which bins?

I would like to see how certain frequencies, specifically low bass at 20 - 60hz are present in a piece of audio. I have the audio as a byte array, I convert it to array of shorts, then into a complex number by (short[i]/(double)short.MaxValue, 0). Then i pass this to the FFT from Aforge.
The audio is mono and sample rate of 44100. I understand I can only put chucks through the FFT at ^2. So 4096 for example. I don't understand what frequencies be in the output bins.
if I am taking 4096 samples from the audio that is at 44100 sample rate. Does this mean I am taking milliseconds worth of audio? or only getting some of the frequencies that will be present?
I add the output of the FFT to a array, my understanding is that as I am taking 4096 then bin 0 would contain 0*44100/4096 = 0hz, bin 1 would hold 1*44100/4096 = 10.7666015625hz and so on. Is this correct? or im I doing something fundamentally wrong here?
My goal would be to average the frequencies between say 20 - 60 hz, so for a song with very low, heavy bass then this number would be higher than say a soft piano piece with very little bass.
Here is my code.
OpenFileDialog file = new OpenFileDialog();
file.ShowDialog();
WaveFileReader reader = new WaveFileReader(file.FileName);
byte[] data = new byte[reader.Length];
reader.Read(data, 0, data.Length);
samepleRate = reader.WaveFormat.SampleRate;
bitDepth = reader.WaveFormat.BitsPerSample;
channels = reader.WaveFormat.Channels;
Console.WriteLine("audio has " + channels + " channels, a sample rate of " + samepleRate + " and bitdepth of " + bitDepth + ".");
short[] shorts = data.Select(b => (short)b).ToArray();
int size = 4096;
int window = 44100 * 10;
int y = 0;
Complex[] complexData = new Complex[size];
for (int i = window; i < window + size; i++)
{
Complex tmp = new Complex(shorts[i]/(double)short.MaxValue, 0);
complexData[y] = tmp;
y++;
}
FourierTransform.FFT(complexData, FourierTransform.Direction.Forward);
double[] arr = new double[complexData.Length];
//print out sample of conversion
for (int i = 0; i < complexData.Length; i++)
{
arr[i] = complexData[i].Magnitude;
}
Console.Write("complete, ");
return arr;
edit : changed to FFT fro DFT
Here's a modified version of your code. Note the comments starting with "***".
OpenFileDialog file = new OpenFileDialog();
file.ShowDialog();
WaveFileReader reader = new WaveFileReader(file.FileName);
byte[] data = new byte[reader.Length];
reader.Read(data, 0, data.Length);
samepleRate = reader.WaveFormat.SampleRate;
bitDepth = reader.WaveFormat.BitsPerSample;
channels = reader.WaveFormat.Channels;
Console.WriteLine("audio has " + channels + " channels, a sample rate of " + samepleRate + " and bitdepth of " + bitDepth + ".");
// *** NAudio "thinks" in floats
float[] floats = new float[data.Length / sizeof(float)]
Buffer.BlockCopy(data, 0, floats, 0, data.Length);
int size = 4096;
// *** You don't have to fill the FFT buffer to get valid results. More noisy & smaller "magnitudes", but better freq. res.
int inputSamples = samepleRate / 100; // 10ms... adjust as needed
int offset = samepleRate * 10 * channels;
int y = 0;
Complex[] complexData = new Complex[size];
// *** get a "scaling" curve to make both ends of sample region 0 but still allow full amplitude in the middle of the region.
float[] window = CalcWindowFunction(inputSamples);
for (int i = 0; i < inputSamples; i++)
{
// *** "floats" is stored as LRLRLR interleaved data for stereo audio
complexData[y] = new Complex(floats[i * channels + offset] * window[i], 0);
y++;
}
// make sure the back portion of the buffer is set to all 0's
while (y < size)
{
complexData[y] = new Complex(0, 0);
y++;
}
// *** Consider using a DCT here instead... It returns less "noisy" results
FourierTransform.FFT(complexData, FourierTransform.Direction.Forward);
double[] arr = new double[complexData.Length];
//print out sample of conversion
for (int i = 0; i < complexData.Length; i++)
{
// *** I assume we don't care about phase???
arr[i] = complexData[i].Magnitude;
}
Console.Write("complete, ");
return arr;
Once you get the results, and assuming a 44100 Hz sample rate and size = 4096, elements 2 - 4 should be the values you are looking for. There's a way to convert them to dB, but I don't remember it offhand.
Good luck!

Categories

Resources