How to stream STT file to IBM Watson (Unity)? - c#

I am using IBM Watson Unity SDK
There are some examples in the web on how to send file to IBM Watson.
But no exact examples how to stream a long file split to parts. So what I want to do:
I have a log audio file (about 1-3min) and want to sent it to Watson to recognize speech.
IBM Watson accepts only <5mb files, but my file is larger, so I need to split it and send as parts.
Here is my code:
private void OnAudioLoaded (AudioClip clip)
{
Debug.Log ("Audio was loaded and starting to stream...");
_chunksCount = 0;
float[] clipData = new float[(int)(clip.length * CHUNK_SIZE)];
clip.GetData (clipData, 1);
try {
_speechToText.StartListening (OnRecognize);
for (int i = 0; i < Math.Ceiling (clip.length / SECONDS_TO_SPLIT); i++) {
Debug.Log ("Iteration of recognition #" + i);
_chunksCount++;
// creating array of floats from clip array
float[] chunkData = new float[SECONDS_TO_SPLIT * (int)CHUNK_SIZE];
Array.Copy (clipData, i * SECONDS_TO_SPLIT * (int)CHUNK_SIZE, chunkData, 0, clipData.Length - i * SECONDS_TO_SPLIT * CHUNK_SIZE < SECONDS_TO_SPLIT * CHUNK_SIZE ? (int)(clipData.Length - i * SECONDS_TO_SPLIT * CHUNK_SIZE) : SECONDS_TO_SPLIT * (int)CHUNK_SIZE);
// creating audioclip from floats array
AudioClip chunk = AudioClip.Create ("ch", clip.frequency * SECONDS_TO_SPLIT, clip.channels, clip.frequency, false);
chunk.SetData (chunkData, 0);
AudioData audioData = new AudioData (chunk, chunk.samples);
// sending recognition request
_speechToText.OnListen (audioData);
}
} catch (OutOfMemoryException e) {
DialogBoxes.CallErrorBox ("Audio Recognition Error", e.Message);
}
}
The problem is:
On line _speechToText.StartListening (OnRecognize); I assign a Callback function OnRecognize, that should be called when something is recognized, but it is never called.
This file i am testing on has been recognized, on online website and it is definitely ok.
Any suggestions?

So the figure was that the data chunks where too small for Watson to recognize, so my solution to this specific problem was to send longer audio chunks, a few seconds long, about half a minute, and the recognition was working properly.
The longer audio file I was sending the better result I received, but still I had to be under 5mb.
This solution is very old, but it can help someone running into the same problem.

Related

How to record an input device with more than 2 channels to mp3 format

I am building a recording software for recording all connected devices to PC into mp3 format.
Here is my code:
IWaveIn _captureInstance = inputDevice.DataFlow == DataFlow.Render ?
new WasapiLoopbackCapture(inputDevice) : new WasapiCapture(inputDevice)
var waveFormatToUse = _captureInstance.WaveFormat;
var sampleRateToUse = waveFormatToUse.SampleRate;
var channelsToUse = waveFormatToUse.Channels;
if (sampleRateToUse > 48000) // LameMP3FileWriter doesn't support a rate more than 48000Hz
{
sampleRateToUse = 48000;
}
else if (sampleRateToUse < 8000) // LameMP3FileWriter doesn't support a rate less than 8000Hz
{
sampleRateToUse = 8000;
}
if (channelsToUse > 2) // LameMP3FileWriter doesn't support a number of channels more than 2
{
channelsToUse = 2;
}
waveFormatToUse = WaveFormat.CreateCustomFormat(_captureInstance.WaveFormat.Encoding,
sampleRateToUse,
channelsToUse,
_captureInstance.WaveFormat.AverageBytesPerSecond,
_captureInstance.WaveFormat.BlockAlign,
_captureInstance.WaveFormat.BitsPerSample);
_mp3FileWriter = new LameMP3FileWriter(_currentStream, waveFormatToUse, 32);
This code works properly, except the cases when a connected device (also virtual as SteelSeries Sonar) has more than 2 channels.
In the case with more than 2 channels all recordings with noise only.
How can I solve this issue? It isn't required to use only LameMP3FileWriter, I only need it will mp3 or any format with good compression. Also if it's possible without saving intermediate files on the disk (all processing in memory), only the final file with audio.
My recording code:
// When the capturer receives audio, start writing the buffer into the mentioned file
_captureInstance.DataAvailable += (s, a) =>
{
lock (_writerLock)
{
// Write buffer into the file of the writer instance
_mp3FileWriter?.Write(a.Buffer, 0, a.BytesRecorded);
}
};
// When the Capturer Stops, dispose instances of the capturer and writer
_captureInstance.RecordingStopped += (s, a) =>
{
lock (_writerLock)
{
_mp3FileWriter?.Dispose();
}
_captureInstance?.Dispose();
};
// Start audio recording
_captureInstance.StartRecording();
If LAME doesn't support more than 2 channels, you can't use this encoder for your purpose. Have you tried it with the Fraunhofer surround MP3 encoder?
Link: https://download.cnet.com/mp3-surround-encoder/3000-2140_4-165541.html
Also, here's a nice article discussing how to convert between most audio formats (with C# code samples): https://www.codeproject.com/articles/501521/how-to-convert-between-most-audio-formats-in-net

Resample NAudio WaveIn microphone input from 8000 to 44100 kHz

I'm writing educational musical app for my students.
I need to pass audio input to pitchAnalyzer; It works fine with 44100 samplerate but showing wrong results with 8000.
I read a lot of examples: how that can be done reading and writing to the file. But i don't have file to read and don't need to write audio.
I already did conversion from 16 bit byte array to IEEE float ( format that pitchAnalyzer algorithm requires) which comes with the WaveInOnDataAvailable event;
void WaveInOnDataAvailable(object sender, WaveInEventArgs waveInEventArgs)
var buffer = waveInEventArgs.Buffer;
float[] samples = new float[buffer.Length/2];
for (int n = 0; n < buffer.Length; n += 2)
{
samples[n / 2] = BitConverter.ToInt16(resampled_buffer, n) / 32768f;
}
WaveIn setup:
waveIn = new WaveIn();
waveIn.DeviceNumber = comboBx_RecordingDevices.SelectedIndex;
waveIn.DataAvailable += WaveInOnDataAvailable;
waveIn.StartRecording();
So how this can be resampled directly to the buffer without writing it to a file?
Is there is a way to change WaveIn input samplerate? Property shows that it can't be set, but probably there is another way?
Thanks a lot in advance

C# - Microphone noise detection

I am using nAudio Library to capture microphone input. But I have run into a problem.
I am using the code(which I have modified slightly) from an nAudio sample app.
The codes generates a WAV file based on mic input and renders it as a wave. Here is the code for that.
private void RenderFile()
{
SampleAggregator.RaiseRestart();
using (WaveFileReader reader = new WaveFileReader(this.voiceRecorderState.ActiveFile))
{
this.samplesPerSecond = reader.WaveFormat.SampleRate;
SampleAggregator.NotificationCount = reader.WaveFormat.SampleRate/10;
//Sample rate is 44100
byte[] buffer = new byte[1024];
WaveBuffer waveBuffer = new WaveBuffer(buffer);
waveBuffer.ByteBufferCount = buffer.Length;
int bytesRead;
do
{
bytesRead = reader.Read(waveBuffer, 0, buffer.Length);
int samples = bytesRead / 2;
double sum = 0;
for (int sample = 0; sample < samples; sample++)
{
if (bytesRead > 0)
{
sampleAggregator.Add(waveBuffer.ShortBuffer[sample] / 32768f);
double sample1 = waveBuffer.ShortBuffer[sample] / 32768.0;
sum += (sample1 * sample1);
}
}
double rms = Math.Sqrt(sum / (SampleAggregator.NotificationCount));
var decibel = 20 * Math.Log10(rms);
System.Diagnostics.Debug.WriteLine(decibel.ToString() + " in dB");
} while (bytesRead > 0);
int totalSamples = (int)reader.Length / 2;
TotalWaveFormSamples = totalSamples / sampleAggregator.NotificationCount;
SelectAll();
}
audioPlayer.LoadFile(this.voiceRecorderState.ActiveFile);
}
Below is a little chunk from a result of a 2second WAV file with no sound but only mic noise.
-54.089102453893 in dB
-51.9171950072361 in dB
-53.3478098666891 in dB
-53.1845794096928 in dB
-53.8851764055102 in dB
-57.5541358628342 in dB
-54.0121140454216 in dB
-55.5204248291508 in dB
-54.9012326746571 in dB
-53.6831017096011 in dB
-52.8728852678309 in dB
-55.7021600863786 in dB
As we can see, the db level hovers around -55 when there is no input sound, only silence. if I record saying "Hello" in mic in a normal tone, the db value will goto -20 or so. I read somewhere that average human talk is around 20dB and -3dB to -6dB is the ZERO value range for mic.
Question: Am I calculating the dB value correctly? (i used a formula proposed here by someone else)...Why dB is always coming up in negative? Am i missing a crucial concept or a mechanism?
I searched nAudio documentation at codeplex and didn't find an answer. In my observation, the documentation there needs to be more explanatory then just a bunch of Q&A [no offense nAudio :)]
If I understood the formula correctly, the actual value you're calculating is dBm, and that's absolutely ok since dB is just a unit to measure amplification and can't be used for measuring signal strength/amplitude (i.e. you can say I amplified the signal by 3 db, but can't say my signal strength is 6 dB).
The negative values are there just because of the logarithmic conversion part of the formula (converting watts/miliWatts to db) and since the signals you're dealing with are relativly weak.
So in conclusion, looks, like you've done everything right.
Hope it helps.
EDIT: BTW, as you can see, there really is ~23-25dbm difference between silence and human speech

Cannot convert Wav to Flac, c#, 1 Error

I'm using this WaveReader class in my code. I'm getting this error:
ERROR: Ensure that samples are integers (e.g. not floating-point numbers)
if (format.wFormatTag != 1) // 1 = PCM 2 = Float
throw new ApplicationException("Format tag " + format.wFormatTag + " is not supported!");
All I'm trying to is convert WAV file to FLAC so I can feed it to GoogleSpeechAPI. I can do the first step, record WAV files. I am stuck on the second step: convert WAV file to FLAC. I can do the 3rd step: convert FLAC to text using GoogleSpeech API.
For the second step, where I'm getting stuck, here is my code:
public void WAV_to_FLAC_converter()
{
string inputFile = "inputFile.wav";
//string outputFile = Path.Combine("flac", Path.ChangeExtension(input, ".flac"));
string outputFile = "outputFile.flac";
if (!File.Exists(inputFile))
throw new ApplicationException("Input file " + inputFile + " cannot be found!");
var stream = File.OpenRead(#"C:\inputFile.wav");
WavReader wav = new WavReader(stream);
using (var flacStream = File.Create(outputFile))
{
FlacWriter flac = new FlacWriter(flacStream, wav.BitDepth, wav.Channels, wav.SampleRate);
// Buffer for 1 second's worth of audio data
byte[] buffer = new byte[wav.Bitrate / 8];
int bytesRead;//**I GET THE ABOVE ERROR HERE.**
do
{
bytesRead = wav.InputStream.Read(buffer, 0, buffer.Length);
flac.Write(buffer, 0, bytesRead);
} while (bytesRead > 0);
flac.Dispose();
flac = null;
}
}
Apparently there is something wrong with the input wav file I am giving the function. I think it says the stream variable that I created is in float-point instead of integer. But what am I supposed to do? I didn't mess with the WAV file. It's just a WAV file. How can I change a WAV file from float-point to Integer?? I don't know how to fix this.
I've tested your code with a random wave file and it worked perfectly.
Then I downloaded a stereo 32-bit float data wave sample from here and I got the same error as you:
ERROR: Ensure that samples are integers (e.g. not floating-point numbers)
Then I debugged the code and following exception was thrown
// Ensure that samples are 16 or 24-bit
if (format.wBitsPerSample != 16 && format.wBitsPerSample != 24)
throw new ApplicationException(format.wBitsPerSample + " bits per sample is not supported by FLAC!");
I'm afraid the WavReader class simply does not support 32 bit float wave samples, nor does the FlacWriter.
UPDATE: I got your project working now. You have to rename your libFlac.dll to LibFlac.dll in your debug folder. There should be no more problems loading the library. What I got then was a PInvokeStackImabalance exception. If you get it too, you could follow the instructions from the post here or simple turn throwing of this type of exception off under Debug->Exceptions->Managed Debugging Assistans->PInvokeStackImbalance.

Broken MIDI File Output

I'm currently trying to implement my own single track MIDI file output. It turns an 8x8 grid of colours stored in multiple frames, into a MIDI file that can be imported into a digital audio interface and played through a Novation Launchpad. Some more context here.
I've managed to output a file that programs recognize as MIDI, but the resultant MIDI does not play, and its not matching files generated via the same frame data. I've been doing comparisions by recording my programs live MIDI messages through a dedicated MIDI program, and then spitting out a MIDI file via that. I then compare my generated file to that properly generated file via a hex editor. Things are correct as far as the headers, but that seems to be it.
I've been slaving over multiple renditions of the MIDI specification and existing Stack Overflow questions with no 100% solution.
Here is my code, based on what I have researched. I can't help but feel I'm missing something simple. I'm avoiding the use of existing MIDI libraries, as I only need this one MIDI function to work (and want the learning experience of doing this from scratch). Any guidance would be very helpful.
/// <summary>
/// Outputs an MIDI file based on frames for the Novation Launchpad.
/// </summary>
/// <param name="filename"></param>
/// <param name="frameData"></param>
/// <param name="bpm"></param>
/// <param name="ppq"></param>
public static void WriteMidi(string filename, List<FrameData> frameData, int bpm, int ppq) {
decimal totalLength = 0;
using (FileStream stream = new FileStream(filename, FileMode.Create, FileAccess.Write)) {
// Output midi file header
stream.WriteByte(77);
stream.WriteByte(84);
stream.WriteByte(104);
stream.WriteByte(100);
for (int i = 0; i < 3; i++) {
stream.WriteByte(0);
}
stream.WriteByte(6);
// Set the track mode
byte[] trackMode = BitConverter.GetBytes(Convert.ToInt16(0));
stream.Write(trackMode, 0, trackMode.Length);
// Set the track amount
byte[] trackAmount = BitConverter.GetBytes(Convert.ToInt16(1));
stream.Write(trackAmount, 0, trackAmount.Length);
// Set the delta time
byte[] deltaTime = BitConverter.GetBytes(Convert.ToInt16(60000 / (bpm * ppq)));
stream.Write(deltaTime, 0, deltaTime.Length);
// Output track header
stream.WriteByte(77);
stream.WriteByte(84);
stream.WriteByte(114);
stream.WriteByte(107);
for (int i = 0; i < 3; i++) {
stream.WriteByte(0);
}
stream.WriteByte(12);
// Get our total byte length for this track. All colour arrays are the same length in the FrameData class.
byte[] bytes = BitConverter.GetBytes(frameData.Count * frameData[0].Colours.Count * 6);
// Write our byte length to the midi file.
stream.Write(bytes, 0, bytes.Length);
// Cycle through frames and output the necessary MIDI.
foreach (FrameData frame in frameData) {
// Calculate our relative delta for this frame. Frames are originally stored in milliseconds.
byte[] delta = BitConverter.GetBytes((double) frame.TimeStamp / 60000 / (bpm * ppq));
for (int i = 0; i < frame.Colours.Count; i++) {
// Output the delta length to MIDI file.
stream.Write(delta, 0, delta.Length);
// Get the respective MIDI note based on the colours array index.
byte note = (byte) NoteIdentifier.GetIntFromNote(NoteIdentifier.GetNoteFromPosition(i));
// Check if the current color signals a MIDI off event.
if (!CheckEqualColor(frame.Colours[i], Color.Black) && !CheckEqualColor(frame.Colours[i], Color.Gray) && !CheckEqualColor(frame.Colours[i], Color.Purple)) {
// Signal a MIDI on event.
stream.WriteByte(144);
// Write the current note.
stream.WriteByte(note);
// Check colour and write the respective velocity.
if (CheckEqualColor(frame.Colours[i], Color.Red)) {
stream.WriteByte(7);
} else if (CheckEqualColor(frame.Colours[i], Color.Orange)) {
stream.WriteByte(83);
} else if (CheckEqualColor(frame.Colours[i], Color.Green) || CheckEqualColor(frame.Colours[i], Color.Aqua) || CheckEqualColor(frame.Colours[i], Color.Blue)) {
stream.WriteByte(124);
} else if (CheckEqualColor(frame.Colours[i], Color.Yellow)) {
stream.WriteByte(127);
}
} else {
// Calculate the delta that the frame had.
byte[] offDelta = BitConverter.GetBytes((double) (frameData[frame.Index - 1].TimeStamp / 60000 / (bpm * ppq)));
// Write the delta to MIDI.
stream.Write(offDelta, 0, offDelta.Length);
// Signal a MIDI off event.
stream.WriteByte(128);
// Write the current note.
stream.WriteByte(note);
// No need to set our velocity to anything.
stream.WriteByte(0);
}
}
}
}
}
BitConverter.GetBytes returns the bytes in the native byte order, but MIDI files use big-endian values. If you're running on x86 or ARM, you must reverse the bytes.
The third value in the file header is not called "delta time"; it is the number of ticks per quarter note, which you already have as ppq.
The length of the track is not 12; you must write the actual length.
Due to the variable-length encoding of delta times (see below), this is usually not possible before collecting all bytes of the track.
You need to write a tempo meta event that specifies the number of microseconds per quarter note.
A delta time is not an absolute time; it specifies the interval starting from the time of the previous event.
A delta time specifies the number ticks; your calculation is wrong.
Use TimeStamp * bpm * ppq / 60000.
Delta times are not stored as a double floating-point number but as a variable-length quantity; the specification has example code for encoding it.
The last event of the track must be an end-of-track meta event.
Another approach would be to use one of the .NET MIDI libraries to write the MIDI file. You just have to convert your frames into Midi objects and pass them to the library to save. The library will take care of all the MIDI details.
You could try MIDI.NET and C# Midi Toolkit. Not sure if NAudio does writing MIDI files and at what abstraction level that is...
Here is more info on the MIDI File Format specification:
http://www.blitter.com/~russtopia/MIDI/~jglatt/tech/midifile.htm
Hope it helps,
Marc

Categories

Resources