Using System.IO BinaryReader object found in .NET mscorlib assembly, I ran a loop that dumped each byte value from a .wav file into Excel spreadsheet. For simplicity sake, I recorded a two second 4K signal from signal generator into software sequencer and saved as monaural wave file. The software I sequence music with shows a resolution of 1ms - which is 44.11 samples(assuming 44.1K sample rate). What I find curious is that the data extracted via ReadInt16() method(starting at position 44 in .wav file) shows varied numbers with integers switching signs seemingly at random- whereas the visual sine wave within sequencer is completely uniform with respect to amplitude and frequency. With 16 bit resolution, I determined that for each sample first byte was frequency resolution and the second amplitude, is correct?
Question: How can I intelligently interpret the integers pulled from wave file for the ultimate purpose of determining rhythmic beats?
Many thanks...........Mickey
For a WAV file with 16 bits per sample, it is not the case that the first byte of the sample is frequency resolution and the second byte is amplitude. Both bytes together indicate the sample's amplitude at that specific point in time. The two bytes are interpreted as a 2-byte integer, so the values will range from -32768 to +32767.
I do not know how your sequencer works or what it is displaying. From your description, it sounds as if your sequencer is using FFT to convert the audio from time-domain (which is what a WAV file is) to frequency-domain (which is a graph with frequency along the x-axis and frequency amplitude along the y-axis). A WAV file does not contain frequency information.
Related
I am trying to normalize Mp3-Files with NAudio but I don't know how to do so.
The first I did was converting the Mp3-File to PCM:
using (Mp3FileReader fr = new Mp3FileReader(mp3.getPathWithFilename())) {
using (WaveStream pcm = WaveFormatConversionStream.CreatePcmStream(fr)) {
WaveFileWriter.CreateWaveFile("test.wav", pcm);
}
}
But what is the next step? Unforntunately I didn't find anything on the net.
Thanks for your help
I'm new to NAudio, so I don't exactly know how to code this, but I do know that normalization of an audio file requires two passes through the data. The first pass is to determine the maximum and minimum data values contained in the file - so you would have to scan each data point and determine the max and min data points (and for both channels if stereo). Then, upon determining the highest max or lowest min (whichever absolute value is highest), you calculate that value as a percentage from Full Scale (the highest or lowest possible value for the bit stream, for example with 16-bit audio it's 32767 or -32768). You then increase the volume by the difference in percentage.
So for example on your scanning pass, you discovered that the highest value in a 16 bit mono file was 29000, you would then increase the volume by 112.989 percent so that the maximum sample is increased from 29000 to 32767, and all other samples are increased accordingly.
I have 2 sample wavfiles.
I would like to combine them into one output wav like this:
Play first wav, wait x seconds play 2nd wav, and save the results as a new wav file.
I'm not particularly attached to the wav format, so happy to use another if necessary.
From my research it looks like I would need to convert the wavs to PCM, and then create a new output buffer and write the first file the output buffer. Then somehow create a space for the x seconds, and then write the second PCM to
How would I go about doing this?
First of all you'll need to undestand what you are talking about
WAV is a type of RIFF which encodes the sound waves as PCM.
Essentially PCM means that discrete values of the wave are stored at a certain sample rate (typically 44 kHz)
Each sample may contain information about one or more channels (typically 2)
The values of each sample are stored as a fixed size integer or float. (typically 16 bit integer)
These attributes are stored in the WAV header
To combine two seperate WAV files you need to read the header of both files and if you are lucky they will have the same ByteRate ( == samplerate * channel count * bits/sample / 8) then you simply need to concat the second file minus the header to the end of the first, and add the length of the second to the 'length' field of the first.
In any other case I advise you to utilize a library that does reencoding of some sort.
If you have the time and muse, you could do the recoding yourself.
If you don't want to bother with this stuff at all try using a complete program (i.E. sox) that does what you need.
Btw.: Silence is 0 values if this bits per sample are signed and half of the max value if they are unsigned (typically only found in 8 bit integers).
So to get 4 seconds of silence you need to have n = 4 * sample rate * channel num * (bits / seconds) / 8 times 0
Trivia: You could use any constant value instead of 0 for silence
How can we identify silence packet in a byte array (Buffer provided by WaveInEventArgs) using Naudio. basically I am trying to loop through the array and checking for 0 values in the array. Is that correct?
I'm not sure what you mean by "packet", but finding silence is usually a matter of looking for consecutive samples having an absolute value less than a "threshold" amount. 0.00006 is -84.437 dB, so silence detection can be done on most audio with that value (though you should feel free to adjust that threshold to fit your audio). Depending on exactly what you are doing, you'll want to see a sequence of anywhere from 440 to 48000 "silent" samples before deciding it's a silent "packet".
Im using kinect to extract audio and classifie its features, but i have a question. On http://msdn.microsoft.com/en-us/library/hh855698.aspx it says the audio.start method Opens an audio data stream (16-bit PCM format, sampled at 16 kHz) and starts capturing audio data streamed out of a sensor. The problem is that i dont know how pcm data is represented and i dont know if the method returns pcm true values or not. Because using the sdk examples i get values like 200, 56, 17 and i think audio values are more like -3*10^-5 .
So does anyone know how do i get the true PCM values? Or am i doing something wrong?
Thanks
I wouldn't expect any particular values. 16-bit PCM means it's a series of 16-bit integers, so -3*10-5 (-0.00003) isn't representable.
I would guess it's encoded with 16-bit signed integers (like a WAV file) which have a range of -32768 to 32767. If you're being very quiet the values will probably be close to 0. If you make a lot of noise you will see some higher values too.
Check out this diagram (from Wikipedia's article on PCM) which shows a sine wave encoded as PCM using 4-bit unsigned integers, which have a range of 0 to 15.
See how that 4-bit sine wave oscillates around 7? That's it's equilibrium. If it was a signed 4-bit integer (which has a range of -8 to 7) it would have the same shape, but its equilibrium would be 0 - the values would be shifted by -8 so it would oscillate around 0.
You can measure the distance from the equilibrium to the highest or lowest points of the sine wave to get its amplitude, or broadly, it's volume (which is why if you're quiet you will mostly see values near 0 in your signed 16-bit data). This is probably the easiest sort of feature detection you can do. You can find plenty of good explanations on the web about this, for example http://scienceaid.co.uk/physics/waves/sound.html.
You could save it to a file and play it back with something like Audacity if you're not sure. Fiddle with the input settings and you'll soon figure out the format.
I'm new to audio analysis, but need to perform a (seemingly) simple task. I have a byte array containing a 16 bit recording (single channel) and a sample rate of 44100. How do I perform a quick analysis to get the volume at any given moment? I need to calculate a threshold, so a function to return true if it's above a certain amplitude (volume) and false if not. I thought I could iterate through the byte array and check its value, with 255 being the loudest, but this doesn't seem to work as even when I don't record anything, background noise gets in and some of the array is filled with 255. Any suggestions would be great.
Thanks
As you have 16-bit data, you should expect the signal to vary between -32768 and +32767.
To calculate the volume you can take intervals of say 1000 samples, and calculate their RMS value. Sum the squared sample values divide by 1000 and take the square root. check this number against you threshold.
Typically one measures the energy of waves using root mean square.
If you want to be more perceptually accurate you can take the time-domain signal through a discrete fourier transform to a frequency-domain signal, and integrate over the magnitudes with some weighting function (since low-frequency waves are perceptually louder than high-frequency waves at the same energy).
But I don't know audio stuff either so I'm just making stuff up. ☺
I might try applying a standard-deviation sliding-window. OTOH, I would not have assumed that 255 = loudest. It may be, but I'd want to know what encoding is being used. If any compression is present, then I doubt 255 is "loudest."