Resample loopback capture - c#

I successfully captured sound from Wasapi using the following code:
IWaveIn waveIn = new WasapiLoopbackCapture();
waveIn.DataAvailable += OnDataReceivedFromWaveOut;
What I need to do now, is to resample the in-memory data to pcm with a sample rate of 8000 and 16 bits per sample mono.
I can't use ACMStream to resample the example, because the recorded audio is 32 bits per second.
I have tried this code to convert bytes from 32 to 16 bits, but all I get every time is just blank audio.
byte[] newArray16Bit = new byte[e.BytesRecorded / 2];
short two;
float value;
for (int i = 0, j = 0; i < e.BytesRecorded; i += 4, j += 2)
{
value = (BitConverter.ToSingle(e.Buffer, i));
two = (short)(value * short.MaxValue);
newArray16Bit[j] = (byte)(two & 0xFF);
newArray16Bit[j + 1] = (byte)((two >> 8) & 0xFF);
}
source = newArray16Bit;

I use this routine to resample on the fly from WASAPI IeeeFloat to the format I need in my app, which is 16kHz, 16 bit, 1 channel. My formats are fixed, so I'm hardcoding the conversions I need, but it can be adapted as needed.
private void ResampleWasapi(object sender, WaveInEventArgs e)
{
//the result of downsampling
var resampled = new byte[e.BytesRecorded / 12];
var indexResampled = 0;
//a variable to flag the mod 3-ness of the current sample
var arity = -1;
var runningSamples = new short[3];
for(var offset = 0; offset < e.BytesRecorded; offset += 8)
{
var float1 = BitConverter.ToSingle(e.Buffer, offset);
var float2 = BitConverter.ToSingle(e.Buffer, offset + 4);
//simple average to collapse 2 channels into 1
float mono = (float)((double)float1 + (double)float2) / 2;
//convert (-1, 1) range int to short
short sixteenbit = (short)(mono * 32767);
//the input is 48000Hz and the output is 16000Hz, so we need 1/3rd of the data points
//so save up 3 running samples and then mix and write to the file
arity = (arity + 1) % 3;
//record the value
runningSamples[arity] = sixteenbit;
//if we've hit the third one
if (arity == 2)
{
//simple average of the 3 and put in the 0th position
runningSamples[0] = (short)(((int)runningSamples[0] + (int)runningSamples[1] + (int)runningSamples[2]) / 3);
//copy that short (2 bytes) into the result array at the current location
Buffer.BlockCopy(runningSamples, 0, resampled, indexResampled, 2);
//for next pass
indexResampled += 2;
}
}
//and tell listeners that we've got data
RaiseDataEvent(resampled, resampled.Length);
}

Related

C# signed fixed point to floating point conversion

I have a temperature sensor returning 2 bytes.
The temperature is defined as follows :
What is the best way in C# to convert these 2 byte to a float ?
My sollution is the following, but I don't like the power of 2 and the for loop :
static void Main(string[] args)
{
byte[] sensorData = new byte[] { 0b11000010, 0b10000001 }; //(-1) * (2^(6) + 2^(1) + 2^(-1) + 2^(-8)) = -66.50390625
Console.WriteLine(ByteArrayToTemp(sensorData));
}
static double ByteArrayToTemp(byte[] data)
{
// Convert byte array to short to be able to shift it
if (BitConverter.IsLittleEndian)
Array.Reverse(data);
Int16 dataInt16 = BitConverter.ToInt16(data, 0);
double temp = 0;
for (int i = 0; i < 15; i++)
{
//We take the LSB of the data and multiply it by the corresponding second power (from -8 to 6)
//Then we shift the data for the next loop
temp += (dataInt16 & 0x01) * Math.Pow(2, -8 + i);
dataInt16 >>= 1;
}
if ((dataInt16 & 0x01) == 1) temp *= -1; //Sign bit
return temp;
}
This might be slightly more efficient, but I can't see it making much difference:
static double ByteArrayToTemp(byte[] data)
{
if (BitConverter.IsLittleEndian)
Array.Reverse(data);
ushort bits = BitConverter.ToUInt16(data, 0);
double scale = 1 << 6;
double result = 0;
for (int i = 0, bit = 1 << 14; i < 15; ++i, bit >>= 1, scale /= 2)
{
if ((bits & bit) != 0)
result += scale;
}
if ((bits & 0x8000) != 0)
result = -result;
return result;
}
You're not going to be able to avoid a loop when calculating this.

How to append/insert bytes? [duplicate]

I want to add some string in the middle of image metadata block. Under some specific marker. I have to do it on bytes level since .NET has no support for custom metadata fields.
The block is built like 1C 02 XX YY YY ZZ ZZ ZZ ... where XX is the ID of the field I need to append and YY YY is the size of it, ZZ = data.
I imagine it should be more or less possible to read all the image data up to this marker (1C 02 XX) then increase the size bytes (YY YY), add data at the end of ZZ and then add the rest of the original file? Is this correct?
How should I go on with it? It needs to work as fast as possible with 4-5 MB JPEG files.
In general there is no way to speed up this operation. You have to read at least portion that needs to be moved and write it again in updated file. Creating new file and copying content to it may be faster if you can parallelize read and write operations.
Note: In you particular case it may not be possible to just insert content in the middle of the file as most of file formats are not designed with such modifcations in mind. Often there are offsets to portions of the file that will be invalid when you shift part of the file. Specifying what file format you trying to work with may help other people to provide better approaches.
Solved the problem with this code:
List<byte> dataNew = new List<byte>();
byte[] data = File.ReadAllBytes(jpegFilePath);
int j = 0;
for (int i = 1; i < data.Length; i++)
{
if (data[i - 1] == (byte)0x1C) // 1C IPTC
{
if (data[i] == (byte)0x02) // 02 IPTC
{
if (data[i + 1] == (byte)fileByte) // IPTC field_number, i.e. 0x78 = IPTC_120
{
j = i;
break;
}
}
}
}
for (int i = 0; i < j + 2; i++) // add data from file before this field
dataNew.Add(data[i]);
int countOld = (data[j + 2] & 255) << 8 | (data[j + 3] & 255); // curr field length
int countNew = valueToAdd.Length; // new string length
int newfullSize = countOld + countNew; // sum
byte[] newSize = BitConverter.GetBytes((Int16)newfullSize); // Int16 on 2 bytes (to use 2 bytes as size)
Array.Reverse(newSize); // changes order 10 00 to 00 10
for (int i = 0; i < newSize.Length; i++) // add changed size
dataNew.Add(newSize[i]);
for (int i = j + 4; i < j + 4 + countOld; i++) // add old field value
dataNew.Add(data[i]);
byte[] newString = ASCIIEncoding.ASCII.GetBytes(valueToAdd);
for (int i = 0; i < newString.Length; i++) // append with new field value
dataNew.Add(newString[i]);
for (int i = j + 4 + newfullSize; i < data.Length; i++) // add rest of the file
dataNew.Add(data[i]);
byte[] finalArray = dataNew.ToArray();
File.WriteAllBytes(Path.Combine(Path.GetDirectoryName(jpegFilePath), "newfile.jpg"), finalArray);
Here is an easy and quite fast solution. It moves all bytes after given offset to their new position according to given extraBytes, so you can insert your data.
public void ExpandFile(FileStream stream, long offset, int extraBytes)
{
// http://stackoverflow.com/questions/3033771/file-io-with-streams-best-memory-buffer-size
const int SIZE = 4096;
var buffer = new byte[SIZE];
var length = stream.Length;
// Expand file
stream.SetLength(length + extraBytes);
var pos = length;
int to_read;
while (pos > offset)
{
to_read = pos - SIZE >= offset ? SIZE : (int)(pos - offset);
pos -= to_read;
stream.Position = pos;
stream.Read(buffer, 0, to_read);
stream.Position = pos + extraBytes;
stream.Write(buffer, 0, to_read);
}
Need to be checked, though...

NAudio frequency analyser giving inconsistant results

I'm developping a simple program that analyses frequencies of audio files.
Using an fft length of 8192, samplerate of 44100, if I use as input a constant frequency wav file - say 65Hz, 200Hz or 300Hz - the output is a constant graph at that value.
If I use a recording of someone speaking, the frequencies has peaks as high as 4000Hz, with an average at 450+ish on a 90 seconds file.
At first I thought it was because of the recording being stereo sound, but converting it to mono with the exact same bitrate as the test files doesn't change much. (average goes down from 492 to 456 but that's still way too high)
Has anyone got an idea as to what could cause this ?
I think I shouldn't find the highest value but perhaps take either an average or a median value ?
EDIT : using the average of the magnitudes per 8192 bytes buffer and getting the index that's closest to that magnitude messes everything up.
This is the code for the handler of the event the Sample Aggregator fires when it has calculated fft for current buffer
void FftCalculated(object sender, FftEventArgs e)
{
int length = e.Result.Length;
float[] magnitudes = new float[length];
for (int i = 0; i < length / 2; i++)
{
float real = e.Result[i].X;
float imaginary = e.Result[i].Y;
magnitudes[i] = (float)(10 * Math.Log10(Math.Sqrt((real * real) + (imaginary * imaginary))));
}
float max_mag = float.MinValue;
float max_index = -1;
for (int i = 0; i < length / 2; i++)
if (magnitudes[i] > max_mag)
{
max_mag = magnitudes[i];
max_index = i;
}
var currentFrequency = max_index * SAMPLERATE / 8192;
Console.WriteLine("frequency be " + currentFrequency);
}
ADDITION : this is the code that reads and sends the file to the analysing part
using (var rdr = new WaveFileReader(audioFilePath))
{
var newFormat = new WaveFormat(Convert.ToInt32(SAMPLERATE/*44100*/), 16, 1);
byte[] buffer = new byte[8192];
var audioData = new AudioData(); //custom class for project
using (var conversionStream = new WaveFormatConversionStream(newFormat, rdr))
{
// Used to send audio in realtime, it's a timestamps issue for the graphs
// I'm working on fixing this, but it has lower priority so disregard it :p
TimeSpan audioDuration = conversionStream.TotalTime;
long audioLength = conversionStream.Length;
int waitTime = (int)(audioDuration.TotalMilliseconds / audioLength * 8192);
while (conversionStream.Read(buffer, 0, buffer.Length) != 0)
{
audioData.AudioDataBase64 = Utils.Base64Encode(buffer);
Thread.Sleep(waitTime);
SendMessage("AudioData", Utils.StringToAscii(AudioData.GetJSON(audioData)));
}
Console.WriteLine("Reached End of File");
}
}
This is the code that receives the audio data
{
var audioData = new AudioData();
audioData =
AudioData.GetStateFromJSON(Utils.AsciiToString(receivedMessage));
QueueAudio(Utils.Base64Decode(audioData.AudioDataBase64)));
}
followed by
var waveFormat = new WaveFormat(Convert.ToInt32(SAMPLERATE/*44100*/), 16, 1);
_bufferedWaveProvider = new BufferedWaveProvider(waveFormat);
_bufferedWaveProvider.BufferDuration = new TimeSpan(0, 2, 0);
{
void QueueAudio(byte[] data)
{
_bufferedWaveProvider.AddSamples(data, 0, data.Length);
if (_bufferedWaveProvider.BufferedBytes >= fftLength)
{
byte[] buffer = new byte[_bufferedWaveProvider.BufferedBytes];
_bufferedWaveProvider.Read(buffer, 0, _bufferedWaveProvider.BufferedBytes);
for (int index = 0; index < buffer.Length; index += 2)
{
short sample = (short)((buffer[index] | buffer[index + 1] << 8));
float sample32 = (sample) / 32767f;
sampleAggregator.Add(sample32);
}
}
}
}
And then the SampleAggregator fires the event above when it's done with the fft.

FFT which frequencies are in which bins?

I would like to see how certain frequencies, specifically low bass at 20 - 60hz are present in a piece of audio. I have the audio as a byte array, I convert it to array of shorts, then into a complex number by (short[i]/(double)short.MaxValue, 0). Then i pass this to the FFT from Aforge.
The audio is mono and sample rate of 44100. I understand I can only put chucks through the FFT at ^2. So 4096 for example. I don't understand what frequencies be in the output bins.
if I am taking 4096 samples from the audio that is at 44100 sample rate. Does this mean I am taking milliseconds worth of audio? or only getting some of the frequencies that will be present?
I add the output of the FFT to a array, my understanding is that as I am taking 4096 then bin 0 would contain 0*44100/4096 = 0hz, bin 1 would hold 1*44100/4096 = 10.7666015625hz and so on. Is this correct? or im I doing something fundamentally wrong here?
My goal would be to average the frequencies between say 20 - 60 hz, so for a song with very low, heavy bass then this number would be higher than say a soft piano piece with very little bass.
Here is my code.
OpenFileDialog file = new OpenFileDialog();
file.ShowDialog();
WaveFileReader reader = new WaveFileReader(file.FileName);
byte[] data = new byte[reader.Length];
reader.Read(data, 0, data.Length);
samepleRate = reader.WaveFormat.SampleRate;
bitDepth = reader.WaveFormat.BitsPerSample;
channels = reader.WaveFormat.Channels;
Console.WriteLine("audio has " + channels + " channels, a sample rate of " + samepleRate + " and bitdepth of " + bitDepth + ".");
short[] shorts = data.Select(b => (short)b).ToArray();
int size = 4096;
int window = 44100 * 10;
int y = 0;
Complex[] complexData = new Complex[size];
for (int i = window; i < window + size; i++)
{
Complex tmp = new Complex(shorts[i]/(double)short.MaxValue, 0);
complexData[y] = tmp;
y++;
}
FourierTransform.FFT(complexData, FourierTransform.Direction.Forward);
double[] arr = new double[complexData.Length];
//print out sample of conversion
for (int i = 0; i < complexData.Length; i++)
{
arr[i] = complexData[i].Magnitude;
}
Console.Write("complete, ");
return arr;
edit : changed to FFT fro DFT
Here's a modified version of your code. Note the comments starting with "***".
OpenFileDialog file = new OpenFileDialog();
file.ShowDialog();
WaveFileReader reader = new WaveFileReader(file.FileName);
byte[] data = new byte[reader.Length];
reader.Read(data, 0, data.Length);
samepleRate = reader.WaveFormat.SampleRate;
bitDepth = reader.WaveFormat.BitsPerSample;
channels = reader.WaveFormat.Channels;
Console.WriteLine("audio has " + channels + " channels, a sample rate of " + samepleRate + " and bitdepth of " + bitDepth + ".");
// *** NAudio "thinks" in floats
float[] floats = new float[data.Length / sizeof(float)]
Buffer.BlockCopy(data, 0, floats, 0, data.Length);
int size = 4096;
// *** You don't have to fill the FFT buffer to get valid results. More noisy & smaller "magnitudes", but better freq. res.
int inputSamples = samepleRate / 100; // 10ms... adjust as needed
int offset = samepleRate * 10 * channels;
int y = 0;
Complex[] complexData = new Complex[size];
// *** get a "scaling" curve to make both ends of sample region 0 but still allow full amplitude in the middle of the region.
float[] window = CalcWindowFunction(inputSamples);
for (int i = 0; i < inputSamples; i++)
{
// *** "floats" is stored as LRLRLR interleaved data for stereo audio
complexData[y] = new Complex(floats[i * channels + offset] * window[i], 0);
y++;
}
// make sure the back portion of the buffer is set to all 0's
while (y < size)
{
complexData[y] = new Complex(0, 0);
y++;
}
// *** Consider using a DCT here instead... It returns less "noisy" results
FourierTransform.FFT(complexData, FourierTransform.Direction.Forward);
double[] arr = new double[complexData.Length];
//print out sample of conversion
for (int i = 0; i < complexData.Length; i++)
{
// *** I assume we don't care about phase???
arr[i] = complexData[i].Magnitude;
}
Console.Write("complete, ");
return arr;
Once you get the results, and assuming a 44100 Hz sample rate and size = 4096, elements 2 - 4 should be the values you are looking for. There's a way to convert them to dB, but I don't remember it offhand.
Good luck!

Naudio - Convert 32 bit wav to 16 bit wav

I've been trying to convert a 32 bit stereo wav to 16 bit mono wav. I use naudio to capture the sound I and thought that using just the two of four more significant bytes will work.
Here is the DataAvailable implementation:
void _waveIn_DataAvailable(object sender, WaveInEventArgs e)
{
byte[] newArray = new byte[e.BytesRecorded / 2];
short two;
for (int i = 0, j = 0; i < e.BytesRecorded; i = i + 4, j = j + 2)
{
two = (short)BitConverter.ToInt16(e.Buffer, i + 2);
newArray[j] = (byte)(two & 0xFF);
newArray[j + 1] = (byte)((two >> 8) & 0xFF);
}
//do something with the new array:
}
Any help would be greatly appreciated!
I finally found the solution. I just had to multiply the converted value by 32767 and cast it to short:
void _waveIn_DataAvailable(object sender, WaveInEventArgs e)
{
byte[] newArray16Bit = new byte[e.BytesRecorded / 2];
short two;
float value;
for (int i = 0, j = 0; i < e.BytesRecorded; i += 4, j += 2)
{
value = (BitConverter.ToSingle(e.Buffer, i));
two = (short)(value * short.MaxValue);
newArray16Bit[j] = (byte)(two & 0xFF);
newArray16Bit[j + 1] = (byte)((two >> 8) & 0xFF);
}
}
A 32-bit sample can be as high as 4,294,967,295 and a 16-bit sample can be as high as 65,536. So you'll have to scale down the 32-bit sample to fit into the 16-bit sample's range. More or less, you're doing something like this...
SixteenBitSample = ( ThirtyTwoBitSample / 4294967295 ) * 65536;
EDIT:
For the stereo to mono portion, if the two channels have the same data, just dump one of them, otherwise, add the wave forms together, and if they fall outside of the sample range (65,536) then you'll have to scale them down similarly to the above equation.
Hope that helps.

Categories

Resources