I'm writing educational musical app for my students.
I need to pass audio input to pitchAnalyzer; It works fine with 44100 samplerate but showing wrong results with 8000.
I read a lot of examples: how that can be done reading and writing to the file. But i don't have file to read and don't need to write audio.
I already did conversion from 16 bit byte array to IEEE float ( format that pitchAnalyzer algorithm requires) which comes with the WaveInOnDataAvailable event;
void WaveInOnDataAvailable(object sender, WaveInEventArgs waveInEventArgs)
var buffer = waveInEventArgs.Buffer;
float[] samples = new float[buffer.Length/2];
for (int n = 0; n < buffer.Length; n += 2)
{
samples[n / 2] = BitConverter.ToInt16(resampled_buffer, n) / 32768f;
}
WaveIn setup:
waveIn = new WaveIn();
waveIn.DeviceNumber = comboBx_RecordingDevices.SelectedIndex;
waveIn.DataAvailable += WaveInOnDataAvailable;
waveIn.StartRecording();
So how this can be resampled directly to the buffer without writing it to a file?
Is there is a way to change WaveIn input samplerate? Property shows that it can't be set, but probably there is another way?
Thanks a lot in advance
Related
In C#, I am trying to save off PCM data to a WAV file. Using NAudio, I create a WaveFileWriter.
this.fileWriter = new WaveFileWriter(#"c:\temp\test.wav", new WaveFormat(this.audioParameters.SampleRate, this.channels));
I am capturing the PCM packet in a float array and write the samples.
var arraySize = noOfFrames * this.channels;
var buffer = new float[arraySize];
Marshal.Copy(data, buffer, 0, arraySize);
this.fileWriter?.WriteSamples(buffer, 0, buffer.Length);
The outputted audio file is of the proper length but the audio is sounds horrible. I can tell it is the same audio but it is not right.
// Non-interlaced 32-bit float format
// ChannelLayout = LayoutStereo
// FramesPerBuffer = 1024
// SampleRate = 44100
// channels = 2
In NAudio, how do I create a WAV file from a PCM stream with the above information?
You are putting 32 bit floating point audio samples into a 16 bit WAV file. Use WaveFormat.CreateIeeeFloatWaveFormat(sampleRate, channels) for the WaveFormat of the WaveFileWriter.
I am using IBM Watson Unity SDK
There are some examples in the web on how to send file to IBM Watson.
But no exact examples how to stream a long file split to parts. So what I want to do:
I have a log audio file (about 1-3min) and want to sent it to Watson to recognize speech.
IBM Watson accepts only <5mb files, but my file is larger, so I need to split it and send as parts.
Here is my code:
private void OnAudioLoaded (AudioClip clip)
{
Debug.Log ("Audio was loaded and starting to stream...");
_chunksCount = 0;
float[] clipData = new float[(int)(clip.length * CHUNK_SIZE)];
clip.GetData (clipData, 1);
try {
_speechToText.StartListening (OnRecognize);
for (int i = 0; i < Math.Ceiling (clip.length / SECONDS_TO_SPLIT); i++) {
Debug.Log ("Iteration of recognition #" + i);
_chunksCount++;
// creating array of floats from clip array
float[] chunkData = new float[SECONDS_TO_SPLIT * (int)CHUNK_SIZE];
Array.Copy (clipData, i * SECONDS_TO_SPLIT * (int)CHUNK_SIZE, chunkData, 0, clipData.Length - i * SECONDS_TO_SPLIT * CHUNK_SIZE < SECONDS_TO_SPLIT * CHUNK_SIZE ? (int)(clipData.Length - i * SECONDS_TO_SPLIT * CHUNK_SIZE) : SECONDS_TO_SPLIT * (int)CHUNK_SIZE);
// creating audioclip from floats array
AudioClip chunk = AudioClip.Create ("ch", clip.frequency * SECONDS_TO_SPLIT, clip.channels, clip.frequency, false);
chunk.SetData (chunkData, 0);
AudioData audioData = new AudioData (chunk, chunk.samples);
// sending recognition request
_speechToText.OnListen (audioData);
}
} catch (OutOfMemoryException e) {
DialogBoxes.CallErrorBox ("Audio Recognition Error", e.Message);
}
}
The problem is:
On line _speechToText.StartListening (OnRecognize); I assign a Callback function OnRecognize, that should be called when something is recognized, but it is never called.
This file i am testing on has been recognized, on online website and it is definitely ok.
Any suggestions?
So the figure was that the data chunks where too small for Watson to recognize, so my solution to this specific problem was to send longer audio chunks, a few seconds long, about half a minute, and the recognition was working properly.
The longer audio file I was sending the better result I received, but still I had to be under 5mb.
This solution is very old, but it can help someone running into the same problem.
I am using nAudio Library to capture microphone input. But I have run into a problem.
I am using the code(which I have modified slightly) from an nAudio sample app.
The codes generates a WAV file based on mic input and renders it as a wave. Here is the code for that.
private void RenderFile()
{
SampleAggregator.RaiseRestart();
using (WaveFileReader reader = new WaveFileReader(this.voiceRecorderState.ActiveFile))
{
this.samplesPerSecond = reader.WaveFormat.SampleRate;
SampleAggregator.NotificationCount = reader.WaveFormat.SampleRate/10;
//Sample rate is 44100
byte[] buffer = new byte[1024];
WaveBuffer waveBuffer = new WaveBuffer(buffer);
waveBuffer.ByteBufferCount = buffer.Length;
int bytesRead;
do
{
bytesRead = reader.Read(waveBuffer, 0, buffer.Length);
int samples = bytesRead / 2;
double sum = 0;
for (int sample = 0; sample < samples; sample++)
{
if (bytesRead > 0)
{
sampleAggregator.Add(waveBuffer.ShortBuffer[sample] / 32768f);
double sample1 = waveBuffer.ShortBuffer[sample] / 32768.0;
sum += (sample1 * sample1);
}
}
double rms = Math.Sqrt(sum / (SampleAggregator.NotificationCount));
var decibel = 20 * Math.Log10(rms);
System.Diagnostics.Debug.WriteLine(decibel.ToString() + " in dB");
} while (bytesRead > 0);
int totalSamples = (int)reader.Length / 2;
TotalWaveFormSamples = totalSamples / sampleAggregator.NotificationCount;
SelectAll();
}
audioPlayer.LoadFile(this.voiceRecorderState.ActiveFile);
}
Below is a little chunk from a result of a 2second WAV file with no sound but only mic noise.
-54.089102453893 in dB
-51.9171950072361 in dB
-53.3478098666891 in dB
-53.1845794096928 in dB
-53.8851764055102 in dB
-57.5541358628342 in dB
-54.0121140454216 in dB
-55.5204248291508 in dB
-54.9012326746571 in dB
-53.6831017096011 in dB
-52.8728852678309 in dB
-55.7021600863786 in dB
As we can see, the db level hovers around -55 when there is no input sound, only silence. if I record saying "Hello" in mic in a normal tone, the db value will goto -20 or so. I read somewhere that average human talk is around 20dB and -3dB to -6dB is the ZERO value range for mic.
Question: Am I calculating the dB value correctly? (i used a formula proposed here by someone else)...Why dB is always coming up in negative? Am i missing a crucial concept or a mechanism?
I searched nAudio documentation at codeplex and didn't find an answer. In my observation, the documentation there needs to be more explanatory then just a bunch of Q&A [no offense nAudio :)]
If I understood the formula correctly, the actual value you're calculating is dBm, and that's absolutely ok since dB is just a unit to measure amplification and can't be used for measuring signal strength/amplitude (i.e. you can say I amplified the signal by 3 db, but can't say my signal strength is 6 dB).
The negative values are there just because of the logarithmic conversion part of the formula (converting watts/miliWatts to db) and since the signals you're dealing with are relativly weak.
So in conclusion, looks, like you've done everything right.
Hope it helps.
EDIT: BTW, as you can see, there really is ~23-25dbm difference between silence and human speech
Right now i have an audio file (2 Channels, 44.1kHz Sample Rate, 16bit Sample size, WAV) I would like to pass it into this method but i am not sure of any way to convert the WAV file to a byte array.
/// <summary>
/// Process 16 bit sample
/// </summary>
/// <param name="wave"></param>
public void Process(ref byte[] wave)
{
_waveLeft = new double[wave.Length / 4];
_waveRight = new double[wave.Length / 4];
if (_isTest == false)
{
// Split out channels from sample
int h = 0;
for (int i = 0; i < wave.Length; i += 4)
{
_waveLeft[h] = (double)BitConverter.ToInt16(wave, i);
_waveRight[h] = (double)BitConverter.ToInt16(wave, i + 2);
h++;
}
}
else
{
// Generate artificial sample for testing
_signalGenerator = new SignalGenerator();
_signalGenerator.SetWaveform("Sine");
_signalGenerator.SetSamplingRate(44100);
_signalGenerator.SetSamples(16384);
_signalGenerator.SetFrequency(5000);
_signalGenerator.SetAmplitude(32768);
_waveLeft = _signalGenerator.GenerateSignal();
_waveRight = _signalGenerator.GenerateSignal();
}
// Generate frequency domain data in decibels
_fftLeft = FourierTransform.FFTDb(ref _waveLeft);
_fftRight = FourierTransform.FFTDb(ref _waveRight);
}
Edit Hi sorry for the confusion. I'm currently new to audio signalling so my explanation of what I might like to get is wrong. For this method to work correctly, i believe i need to pass in the byte array of the data chunk in the wav file only. The end result would be to apply fft on it as shown in the code and transform it to a spectrogram. Thanks.
you need:
using System.IO;
and this code to get the byte array
byte[] data = File.ReadAllBytes(PathToFile);
where PathToFile is the Location (as String) of the .wav file.
Edit:
Right now i have an audio file (2 Channels, 44.1kHz Sample Rate, 16bit Sample size, WAV) I would like to pass it into this method but i am not sure of any way to convert the WAV file to a byte array.
He asks for a function to get the byte array from the .wav file he didn't say anything about getting the specific part of the byte array that contains the data of the music.
So Downvoting a correct answer is..
We are using the NAudio Stack written in c# and trying to capture the audio in Exclusive mode with PCM 8kHZ and 16bits per sample.
In the following function:
private void InitializeCaptureDevice()
{
if (initialized)
return;
long requestedDuration = REFTIMES_PER_MILLISEC * 100;
if (!audioClient.IsFormatSupported(AudioClientShareMode.Shared, WaveFormat) &&
(!audioClient.IsFormatSupported(AudioClientShareMode.Exclusive, WaveFormat)))
{
throw new ArgumentException("Unsupported Wave Format");
}
var streamFlags = GetAudioClientStreamFlags();
audioClient.Initialize(AudioClientShareMode.Shared,
streamFlags,
requestedDuration,
requestedDuration,
this.waveFormat,
Guid.Empty);
int bufferFrameCount = audioClient.BufferSize;
this.bytesPerFrame = this.waveFormat.Channels * this.waveFormat.BitsPerSample / 8;
this.recordBuffer = new byte[bufferFrameCount * bytesPerFrame];
Debug.WriteLine(string.Format("record buffer size = {0}", this.recordBuffer.Length));
initialized = true;
}
We configured the WaveFormat before calls this function to (8000,1) and also a period of 100 ms.
We expected the system to allocate 1600 bytes for the buffer and interval of 100 ms as requested.
But we noticed following occured:
1. the system allocated audioClient.BufferSize to be 4800 and "this.recordBuffer" an array of 9600 bytes (which means a buffer for 600ms and not 100ms).
2. the thread is going to sleep and then getting 2400 samples (4800 bytes) and not as expected frames of 1600 bytes
Any idea what is going there?
You say you are capturing audio in exclusive mode, but in the example code you call the Initialize method with AudioClientMode.Shared. It strikes me as very unlikely that shared mode will let you work at 8kHz. Unlike the wave... APIs, WASAPI does no resampling for you of playback or capture, so the soundcard itself must be operating at the sample rate you specify.