NAudio Normalize Audio - c#

I am trying to normalize Mp3-Files with NAudio but I don't know how to do so.
The first I did was converting the Mp3-File to PCM:
using (Mp3FileReader fr = new Mp3FileReader(mp3.getPathWithFilename())) {
using (WaveStream pcm = WaveFormatConversionStream.CreatePcmStream(fr)) {
WaveFileWriter.CreateWaveFile("test.wav", pcm);
}
}
But what is the next step? Unforntunately I didn't find anything on the net.
Thanks for your help

I'm new to NAudio, so I don't exactly know how to code this, but I do know that normalization of an audio file requires two passes through the data. The first pass is to determine the maximum and minimum data values contained in the file - so you would have to scan each data point and determine the max and min data points (and for both channels if stereo). Then, upon determining the highest max or lowest min (whichever absolute value is highest), you calculate that value as a percentage from Full Scale (the highest or lowest possible value for the bit stream, for example with 16-bit audio it's 32767 or -32768). You then increase the volume by the difference in percentage.
So for example on your scanning pass, you discovered that the highest value in a 16 bit mono file was 29000, you would then increase the volume by 112.989 percent so that the maximum sample is increased from 29000 to 32767, and all other samples are increased accordingly.

Related

How to get peak power of sound in Unity C#

After record a sound in Unity, is it possible to get the peak power of the sound? Or is it have any way to calculate the peak power of a sound?
Peak isn't very interesting in sound. If you want something closer to perceived volume of the sound, one pretty good metric is RMS. To get this, you have to do just a bit of math:
Load the sample data using audio.GetOutputData
Sum squares of all the sampled values
Take the square root of sum / amountOfSamples - that's RMS (root-mean-square)
If you want to have a value in dB, you can get it as 20 * log10(rms / reference), where reference stands for the value you want to have at 0 dB. A good reference point is 0.1, for example. Note that the RMS value will always be from 0 to 1, while dB values are a bit wilder - they better approximate human hearing, though. If you want to be really serious, different frequencies are perceived at different volumes - have a look at dBA, for example.

Kinect Audio PCM Values

Im using kinect to extract audio and classifie its features, but i have a question. On http://msdn.microsoft.com/en-us/library/hh855698.aspx it says the audio.start method Opens an audio data stream (16-bit PCM format, sampled at 16 kHz) and starts capturing audio data streamed out of a sensor. The problem is that i dont know how pcm data is represented and i dont know if the method returns pcm true values or not. Because using the sdk examples i get values like 200, 56, 17 and i think audio values are more like -3*10^-5 .
So does anyone know how do i get the true PCM values? Or am i doing something wrong?
Thanks
I wouldn't expect any particular values. 16-bit PCM means it's a series of 16-bit integers, so -3*10-5 (-0.00003) isn't representable.
I would guess it's encoded with 16-bit signed integers (like a WAV file) which have a range of -32768 to 32767. If you're being very quiet the values will probably be close to 0. If you make a lot of noise you will see some higher values too.
Check out this diagram (from Wikipedia's article on PCM) which shows a sine wave encoded as PCM using 4-bit unsigned integers, which have a range of 0 to 15.
See how that 4-bit sine wave oscillates around 7? That's it's equilibrium. If it was a signed 4-bit integer (which has a range of -8 to 7) it would have the same shape, but its equilibrium would be 0 - the values would be shifted by -8 so it would oscillate around 0.
You can measure the distance from the equilibrium to the highest or lowest points of the sine wave to get its amplitude, or broadly, it's volume (which is why if you're quiet you will mostly see values near 0 in your signed 16-bit data). This is probably the easiest sort of feature detection you can do. You can find plenty of good explanations on the web about this, for example http://scienceaid.co.uk/physics/waves/sound.html.
You could save it to a file and play it back with something like Audacity if you're not sure. Fiddle with the input settings and you'll soon figure out the format.

Sound to 3 main frequencies(low, mid, high)

I have made some research but I couldn't find what I am exactly looking for. At the moment, I have to send channel values by com port.
For example:
the content of file freqs.ini
low=0-xx khz;
mid=xx-yy khz;
high=yy-zz khz;
Then I will get values by percentage like
the expecting values
lowPercent = 10;
midPercent = 77;
highPercent = 53;
So, I will be able to send these values by rs232 and my room will turn into club :) (I am using this code to illuminate LED strips). I have found some spectrum analyser projects but they all have 9 channels, that is, 3*3 combinations from low-low to high-high.
I know how to communicate with com port, but how can I get integer values of 3 frequency range I have set before?
I don't know if you still need of that but....
Do you want to know how to get a real-time spectral analsys of sound?
1.implement a queue to take a buffer of audio samples
2.take the product of buffer and a proper window function (tipically , Hamming or Hann) calculated by your program as float array
3.do FFT of yelded array: there are may algortihms out there for every language....find the best one for you, use it and take the square module from each output coefficent ( Real_part^2 + Imaginary_part^2 , if FFT returns you algebrical representation of coefficients)
sum coefficients across your bands: to know what coefficient is associated to a frequency you've just got to know that the k-th coefficient is at (SampFrequency/BufferLength)*k Hz.....so it's easy to find band boundaries
if you need to normalize in [0 , 1] interval, you have to do nothing but divide each of 3 yelded bands value for maximum value between the 3
pop your buffer queue by a Shift value that is Shift <= BufferLength and start again
the number of coefficients coming from FFT alg is equal to BufferLength (this is beacause the Discrete Fourier Transform definition) so, the frequency resolution is better when you select a long buffer, but the program goes slower. The light intensity wont' vary after BufferLength audio frames, buf after Shift audio frames.....and high ratio beetwen BufferLength gives you slowly light changes....so you must select parameters that fits your desires, remembering that you have just to turn on & off some light....make your alg fast and lo-fi!
The last thing to do is discover freqeuncy bands from your mixer's eq knobs....i don't remember if this information was on mixers handbooks

Audio Beat Detection in C#

Using System.IO BinaryReader object found in .NET mscorlib assembly, I ran a loop that dumped each byte value from a .wav file into Excel spreadsheet. For simplicity sake, I recorded a two second 4K signal from signal generator into software sequencer and saved as monaural wave file. The software I sequence music with shows a resolution of 1ms - which is 44.11 samples(assuming 44.1K sample rate). What I find curious is that the data extracted via ReadInt16() method(starting at position 44 in .wav file) shows varied numbers with integers switching signs seemingly at random- whereas the visual sine wave within sequencer is completely uniform with respect to amplitude and frequency. With 16 bit resolution, I determined that for each sample first byte was frequency resolution and the second amplitude, is correct?
Question: How can I intelligently interpret the integers pulled from wave file for the ultimate purpose of determining rhythmic beats?
Many thanks...........Mickey
For a WAV file with 16 bits per sample, it is not the case that the first byte of the sample is frequency resolution and the second byte is amplitude. Both bytes together indicate the sample's amplitude at that specific point in time. The two bytes are interpreted as a 2-byte integer, so the values will range from -32768 to +32767.
I do not know how your sequencer works or what it is displaying. From your description, it sounds as if your sequencer is using FFT to convert the audio from time-domain (which is what a WAV file is) to frequency-domain (which is a graph with frequency along the x-axis and frequency amplitude along the y-axis). A WAV file does not contain frequency information.

Volume from byte array

I'm new to audio analysis, but need to perform a (seemingly) simple task. I have a byte array containing a 16 bit recording (single channel) and a sample rate of 44100. How do I perform a quick analysis to get the volume at any given moment? I need to calculate a threshold, so a function to return true if it's above a certain amplitude (volume) and false if not. I thought I could iterate through the byte array and check its value, with 255 being the loudest, but this doesn't seem to work as even when I don't record anything, background noise gets in and some of the array is filled with 255. Any suggestions would be great.
Thanks
As you have 16-bit data, you should expect the signal to vary between -32768 and +32767.
To calculate the volume you can take intervals of say 1000 samples, and calculate their RMS value. Sum the squared sample values divide by 1000 and take the square root. check this number against you threshold.
Typically one measures the energy of waves using root mean square.
If you want to be more perceptually accurate you can take the time-domain signal through a discrete fourier transform to a frequency-domain signal, and integrate over the magnitudes with some weighting function (since low-frequency waves are perceptually louder than high-frequency waves at the same energy).
But I don't know audio stuff either so I'm just making stuff up. ☺
I might try applying a standard-deviation sliding-window. OTOH, I would not have assumed that 255 = loudest. It may be, but I'd want to know what encoding is being used. If any compression is present, then I doubt 255 is "loudest."

Categories

Resources