I want to use NAudio to extract the peak values of wav audio files like the audiowaveform library provides. Which is an array of short values like:
{
"version": 2,
"channels": 2,
"sample_rate": 48000,
"samples_per_pixel": 512,
"bits": 8,
"length": 3,
"data": [-65,63,-66,64,-40,41,-39,45,-55,43,-55,44]
}
https://github.com/bbc/audiowaveform/blob/master/doc/DataFormat.md
I saw that NAudio has a built-in WaveformRenderer that outputs png images, but I just need the raw peak data. Built-in NAudio classes like MaxPeakProvider (which is initialized with waveReader.ToSampleProvider()) etc works with floats, I need them as short values. Is it possible?
Edit after #Blindy's response
I have this piece of code, converting peak data to shorts, but it produces slightly wrong values when comparing with the output of audiowaveform. What might be the problem? I guess there is a data type conversion or rounding error?
var waveReader = new WaveFileReader(openFileDialog1.FileName);
var wave = new WaveChannel32(waveReader);
var peakProvider = new MaxPeakProvider();
int bytesPerSample = (waveReader.WaveFormat.BitsPerSample / 8);
var samples = waveReader.Length / bytesPerSample;
int pixelsPerPeak = 1;
int spacerPixels = 0;
var samplesPerPixel = 32;
var stepSize = pixelsPerPeak + spacerPixels;
peakProvider.Init(waveReader.ToSampleProvider(), samplesPerPixel * stepSize);
List<short> resultValuesAsShort = new List<short>();
List<float> resultValuesAsFloat= new List<float>();
PeakInfo currentPeak = null;
for (int i = 0; i < samples / samplesPerPixel; i++)
{
currentPeak = peakProvider.GetNextPeak();
resultValuesAsShort.Add((short)(currentPeak.Min * short.MaxValue));
resultValuesAsShort.Add((short)(currentPeak.Max * short.MaxValue));
//resultValuesAsFloat.Add(currentPeak.Min);
//resultValuesAsFloat.Add(currentPeak.Max);
}
Here are the results comparison:
Edit2: Examining further, I noticed that it is generating quite different results for the latter values than audiowaveform library and I don't have any idea for the reason:
MaxPeakProvider [...] works with floats, I need them as short values
NAudio in general works with floats, it's just how it was designed. To get the 2-byte short representation, multiply each float by short.MaxValue (the floats are in the [-1..1] range).
Related
I got a big byte array (around 50kb) and i need to extract numeric values from it. Every three bytes are representing one value.
What i tried is to work with LINQs skip & take but it's really slow regarding the large size of the array.
This is my very slow routine:
List<int> ints = new List<int>();
for (int i = 0; i <= fullFile.Count(); i+=3)
{
ints.Add(BitConverter.ToInt16(fullFile.Skip(i).Take(i + 3).ToArray(), 0));
}
I think i got a wrong approach to this.
Your code
First of all, ToInt16 only uses two bytes. So your third byte will be discarded.
You can't use ToInt32 as it would include one extra byte.
Let's review this:
fullFile.Skip(i).Take(i + 3).ToArray()
..and take a careful look at Take(i + 3). It says that you want to copy a larger and larger buffer. For instance, when i is on index 32000 you copy 32003 bytes into your new buffer.
That's why the code is quite slow.
The code is also slow since you allocate a lot of byte buffers which will need to be garbage collected. 65535 extra buffers of growing size which would have to be garbage collected.
You could also have done like this:
List<int> ints = new List<int>();
var workBuffer = new byte[4];
for (int i = 0; i <= fullFile.Length; i += 3)
{
// Copy the three bytes into the beginning of the temp buffer
Buffer.BlockCopy(fullFile, i, workBuffer, 0, 3);
// Now we can use ToInt32 as the last byte always is zero
var value = BitConverter.ToInt32(workBuffer, 0);
ints.Add(value);
}
Quite easy to understand, but not the fastest code.
A better solution
So the most efficient way is to do the conversion by yourself (bit shifting).
Something like:
List<int> ints = new List<int>();
for (int i = 0; i <= fullFile.Length; i += 3)
{
// This code assume little endianess
var value = (fullFile[i + 2] << 16)
+ (fullFile[i + 1] << 8)
+ fullFile[i];
ints.Add(value);
}
This code do not allocate anything extra (except the ints), and should be quite fast.
You can read more about Shift operators in MSDN. And about endianess
I'm programming an C# emulator and decided to output the PCM using CScore.
When the sample size (for each channel) is one byte, the sound outputs correctly, but when I increase the sample size to 16 bits, the sound is very very noisy.
A related question to that problem is how those 2 bytes are interpreted (are they signed? high byte first?)
This is roughly what I'm doing:
First I generate the samples as such
public void GenerateSamples(int sampleCount)
{
while(sampleCount > 0)
{
--sampleCount;
for(int c = 0; c < _numChannels; ++c)
{
_buffer[_sampleIndex++] = _outputValue;
}
// The amount of ticks in a sample
_tickCounter -= APU.MinimumTickThreshold;
if(_tickCounter < 0)
{
_tickCounter = _tickThreshold;
_up = !_up;
// Replicating signed behaviour
_outputValue = (short)(_up ? 32767 : -32768);
}
}
}
This will generate a simple square wave with the frequency determined by the _tickThreshold. If the _buffer is a byte array, the sound is correct.
I want to output it with shorts because it will enable me to use signed samples and simply add multiple channels in order to mix them.
This is how I'm outputting the sound.
for(int i = 0; i < sampleCount; ++i)
{
for(int c = 0; c < _numChannels; ++c)
{
short sample = _channel.Buffer[_channelSampleIndex++];
// Outputting the samples the other way around doesn't output
// sound for me
_buffer[_sampleIndex++] = (byte)sample;
_buffer[_sampleIndex++] = (byte)(sample >> 8);
}
}
The WaveFormat I'm using is determined like this:
_waveFormat = new WaveFormat(_apu.SampleRate, // 44000
_apu.SampleSize * 8, // 16
_apu.NumChannels); // 2
I'm pretty sure there is something obvious I'm missing, but I've been debugging this for a while and don't seem to pinpoint where the problem is.
Thanks
Walk of shame here.
The problem was that I wasn't taking in account that now I needed to generate half the amount of samples (CScore asked for amount of bytes, not samples).
In my example, I had to divide the sampleCount variable by the sampleSize to generate the correct amount of sound.
The noise came because I wasn't synchronizing the extra samples with the next Read call from CScore (I'm generating sound on the fly, instead of pre-buffering it. This way I have no delay introduced because of extra samples).
I found out about the problem looking at this: SampleToPcm16.cs
Assuming my WAV file contains 16 bit PCM, How can I read wav file as double array:
using (WaveFileReader reader = new WaveFileReader("myfile.wav"))
{
Assert.AreEqual(16, reader.WaveFormat.BitsPerSample, "Only works with 16 bit audio");
byte[] bytesBuffer = new byte[reader.Length];
int read = reader.Read(bytesBuffer, 0, buffer.Length);
// HOW TO GET AS double ARRAY
}
Just use the ToSampleProvider extension method on your WaveFileReader, and the Read method will take a float[] with the samples converted to floating point. Alternatively use AudioFileReader instead of WaveFileReader and again you can access a version of the Read method that fills a float[]
16-bit PCM is an signed-integer encoding. Presuming you want doubles between 0 and 1, you simply read each sample as an 16-bit signed integer, and then divide by (double)32768.0;
var floatSamples = new double[read/2];
for(int sampleIndex = 0; sampleIndex < read/2; sampleIndex++) {
var intSampleValue = BitConverter.ToInt16(bytesBuffer, sampleIndex*2);
floatSamples[sampleIndex] = intSampleValue/32768.0;
}
Note that stereo channels are interleaved (left sample, right sample, left sample, right sample).
There's some good info on the format here: http://blog.bjornroche.com/2013/05/the-abcs-of-pcm-uncompressed-digital.html
From a SDK I get images that have the pixel format BGR packed, i.e. BGRBGRBGR. For another application, I need to convert this format to RGB planar RRRGGGBBB.
I am using C# .NET 4.5 32bit and the data is in byte arrays which have the same size.
Right now I am iterating through the array source and assigning the BGR values to there appropriate places in the target array, but that takes too long (180ms for a 1,3megapixel image). The processor the code runs at has access to MMX, SSE, SSE2, SSE3, SSSE3.
Is there a way to speed up the conversion?
edit: Here is the conversion I am using:
// the array with the BGRBGRBGR pixel data
byte[] source;
// the array with the RRRGGGBBB pixel data
byte[] result;
// the amount of pixels in one channel, width*height
int imageSize;
for (int i = 0; i < source.Length; i += 3)
{
result[i/3] = source[i + 2]; // R
result[i/3 + imageSize] = source[i + 1]; // G
result[i/3 + imageSize * 2] = source[i]; // B
}
edit: I tried splitting the access to the source array into three loops, one for each channel, but it didn't really help. So I'm open to suggestions.
for (int i = 0; i < source.Length; i += 3)
{
result[i/3] = source[i + 2]; // R
}
for (int i = 0; i < source.Length; i += 3)
{
result[i/3 + imageSize] = source[i + 1]; // G
}
for (int i = 0; i < source.Length; i += 3)
{
result[i/3 + imageSize * 2] = source[i]; // B
}
Bump because the question is still unanswered. Any advise is very appreciated!
You could try to use SSE3's PSHUFB - Packed Shuffle Bytes instruction. Make sure you are using aligned memory read/writes. You will have to do something tricky to deal with the last dangling B value in each XMMWORD-sized block. Might be tough to get it right but should be a huge speedup. You could also look for library code. I'm guessing you will need to make a C or C++ DLL and use P/Invoke, but maybe there is a way to use SSE instructions from C# that I don't know about.
edit - this question is for a slightly different problem, ARGB to BGR, but the techniques used are similar to what you need.
Basler company has a SDK for their cameras, called Basler Pylon, working in Windows and Linux.
This SDK has APIs for C++, C# and more.
It has an image conversion class PixelDataConverter, which seems to be what you need.
I'm writing a program which uses OpenCv neural networks module along with C# and OpenCvSharp library. It must recognise the face of user, so in order to train the network, i need a set of samples. The problem is how to convert a sample image into array suitable for training. What i've got is 200x200 BitMap image, and network with 40000 input neurons, 200 hidden neurons and one output:
CvMat layerSizes = Cv.CreateMat(3, 1, MatrixType.S32C1);
layerSizes[0, 0] = 40000;
layerSizes[1, 0] = 200;
layerSizes[2, 0] = 1;
Network = new CvANN_MLP(layerSizes,MLPActivationFunc.SigmoidSym,0.6,1);
So then I'm trying to convert BitMap image into CvMat array:
private void getTrainingMat(int cell_count, CvMat trainMAt, CvMat responses)
{
CvMat res = Cv.CreateMat(cell_count, 10, MatrixType.F32C1);//10 is a number of samples
responses = Cv.CreateMat(10, 1, MatrixType.F32C1);//array of supposed outputs
int counter = 0;
foreach (Bitmap b in trainSet)
{
IplImage img = BitmapConverter.ToIplImage(b);
Mat imgMat = new Mat(img);
for (int i=0;i<imgMat.Height;i++)
{
for (int j = 0; j < imgMat.Width; j++)
{
int val =imgMat.Get<int>(i, j);
res[counter, 0] = imgMat.Get<int>(i, j);
}
responses[i, 0] = 1;
}
trainMAt = res;
}
}
And then, when trying to train it, I've got this exception:
input training data should be a floating-point matrix withthe number of rows equal to the number of training samples and the number of columns equal to the size of 0-th (input) layer
Code for training:
trainMAt = Cv.CreateMat(inp_layer_size, 10, MatrixType.F32C1);
responses = Cv.CreateMat(inp_layer_size, 1, MatrixType.F32C1);
getTrainingMat(inp_layer_size, trainMAt, responses);
Network.Train(trainMAt, responses, new CvMat(),null, Parameters);
I'm new to OpenCV and I think I did something wrong in converting because of lack of understanding CvMat structure. Where is my error and is there any other way of transforming the bitmap?
With the number of rows equal to the number of training samples
That's 10 samples.
and the number of columns equal to the size of 0-th (input) layer
That's inp_layer_size.
trainMAt = Cv.CreateMat(10, inp_layer_size, MatrixType.F32C1);
responses = Cv.CreateMat(10, 1, MatrixType.F32C1); // 10 labels for 10 samples
I primarily do C++, so forgive me if I'm misunderstanding, but your pixel loop will need adapting in addition.
Your inner loop looks broken, as you assign to val, but never use it, and also never increment your counter.
In addition, in your outer loop assigning trainMAt = res; for every image doesn't seem like a very good idea.
I am certain you will get it to operate correctly, just keep in mind the fact that the goal is to flatten each image into a single row, so you end up with 10 rows and inp_layer_size columns.