NAudio frequency band intensity - c#

I have an audio player using NAudio and I would like to display a real time intensity for each frequency band.
I have an event triggered for each block of 1024 samples:
public void Update(Complex[] fftResults)
{
// ??
}
What i would like to have is an array of numbers indicating the intensity of each frequency band. Lets say I would like to divide the window into 16 bands.
For example when there are more bass frequencies it could look like this:
░░░░░░░░░░░░░░░░
▓▓▓░░░░░░░░░░░░░
▓▓▓░░░░░░░░░░░░░
▓▓▓▓░░░░░░░░░░░░
▓▓▓▓▓░░░░░░░░░░░
▓▓▓▓▓▓▓▓░░░▓░░▓░
What should I put into that event handler if this is possible with that data?
Data coming (Complex[]) has already been transformed with the FFT.
It is a stereo stream.
First try:
double[] bandIntensity = new double[16] { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
public void Update(Complex[] fftResults)
{
// using half fftResults because the others are just mirrored
int band = 0;
for (int n = 0; n < fftResults.Length/2; n++)
{
band = (int)(.5 * n / fftResults.Length * bandIntensity.Length);
bandIntensity[band] += Math.Sqrt(fftResults[n].X * fftResults[n].X + fftResults[n].Y * fftResults[n].Y);
bandIntensity[band] /= 2;
}
}
The above is doing something but I think too much goes into the first two bands, and I'm playing shakira which does not have that much bass.
Thanks!

There are two separate issues that you probably want to address here:
(1) Window Function
You need to apply a window function to your data prior to the FFT, otherwise you will get spectral leakage which will results in a very smeared spectrum. One unpleasant side effect of spectral leakage is that if you have any kind of significant DC (0 Hz) component then this will result in the kind of 1/f shape that you are seeing on your bar graph.
(2) Log amplitude/frequency axes
Human hearing is essentially logarithmic in both the intensity and frequency axes. Not only that, but speech and music tend to have more energy in the lower frequency part of the spectrum. To get a more pleasing and meaningful display of intensity versus frequency we usually make both the magnitude and frequency axes logarithmic. In the case of the magnitude axis this is normally taken care of by plotting dB re full scale, i.e.
magnitude_dB = 10 * log10(magnitude);
In the case of the frequency axis you will probably want to group your bins into bands, which might each be an octave (2:1 frequency range), or more commonly for higher resolution, third octave. So if you just want 10 "bars" then you might use the following octave bands:
25 - 50 Hz
50 - 100 Hz
100 - 200 Hz
200 - 400 Hz
400 - 800 Hz
800 - 1600 Hz
1600 - 3200 Hz
3200 - 6400 Hz
6400 - 12800 Hz
12800 - 20000 Hz
(assuming you have a 44.1 kHz sample rate and an upper limit on your audio input hardware of 20 kHz).
Note that while having a magnitude (dB) intensity scale is pretty much mandatory for this kind of application, the log frequency axis is less critical, so you could try with your existing linear binning for now, and just see what effect you get from applying a window function in the time domain (assuming you don't already have one) and converting the magnitude scale to dB.

Related

Is there a way to compress Y rotation axis to one byte?

I am making a Unity Multiplayer game, and I wanted to compress the Y rotation axis from sending the whole Quaternion to just sending one byte.
My first compression attempt:
Instead of sending Quaternion, I have just sent a Y-axis float value
Result: 16 bytes -> 4 bytes (12 bytes saved overall)
Second compression attempt:
I have cached lastSentAxis variable (float) which contains the last Y-axis value that has been sent to the server
When a player changes their rotation (looks right/left), then a new Y-axis is compared to the cached one, and a delta value is prepared (delta is guaranteed to be less than 255).
Then, I create a new sbyte - which contains rotation way (-1, if turned left, 1, if turned right)
Result: 4 bytes -> 2 bytes (2 bytes saved, 14 overall)
Third compression attempt (failed)
Define a byte flag instead of creating a separated byte mentioned before (1 - left, 2 - right)
Get a delta rotation value (as mentioned previously), but add it to the byte flag
PROBLEM: I have looped through 0 to 255 to find which numbers will collide with the byte flag.
POTENTIAL SOLUTION: Check if flag + delta is in the colliding number list. If yes, don't send a rotation request.
Every X requests, send a correction float value
Potential result: 2 bytes -> 1 byte (1 byte saved, 15 overall)
My question is, is it possible to make a third compression attempt in a more... proper way or my potential solution is only possible thing I can achieve?
I would not claim that you saved overall 15 bytes ^^
If you only need one component of the rotation anyway then the first step of syncing a single float (4 bytes) seems actually pretty obvious ;)
I would also say that going beyond that sounds a bit like an unnecessary micro optimization.
The delta sync is quite clever and at first glance is a 100% improvement from 4 bytes to 2 bytes.
But
it is also quite error prone and could go desync if only one single transmission fails.
this of course lowers the precision down to 1 degree integer steps instead of a full float value.
Honestly I would stick to the 4 bytes just for stability and precision.
2 bytes - about 0.0055° precision
With 2 bytes you can actually go way better than your attempt!
Why waste an entire byte just for the sign of the value?
use a short
uses a single bit for the sign
still has 15 bits left for the value!
You just would have to map your floating point range of -180 to 180 to the range -32768 to 32767.
Sending
// your delta between -180 and 180
float actualAngleDelta;
var shortAngleDelta = (short)Mathf.RondToInt(actualAngleDelta / 180f * shortMaxValue);
var sendBytes = BitConverter.GetBytes(shortAngleDelta);
Receiving
short shortAngleDelta = BitConverter.ToInt16(receivedBytes);
float actualAngleDelta = (float) shortAngleDelta / (float)short.MaxValue * 360f;
But honestly then you should rather not sync the delta but the actual value.
So, use a ushort!
It covers values from 0 to 65535 so just map the possible 360 degrees on that. Sure you lose a little bit on precision but not down to full degrees ;)
// A value between 0 and 360
float actualAngle;
ushort ushortAngle = (ushort) Mathf.RoundToInt((actualAngle % 360f) / 360f * ushort.MaxValue);
byte[] sendBytes = BitConverter.GetBytes(ushortAngle);
Receiving
ushort ushortAngle = BitConverter.ToUInt16(receivedBytes, 0);
float actualAngle = (float)ushortAngle / (float)ushort.MaxValue * 360f;
Both maintains a precision down to about 0.0055 (= 360/65535) degrees!
Single byte - about 1.41° precision
If a lower precision is an option for you anyway you could however go totally fancy and say you don't sync every exact rotation angle in degrees but rather divide a circle not by 360 but by 256 steps.
Then you could map the delta to your lesser grained "degree" angles and could cover the entire circle in a single byte:
Sending
byte sendByte = (byte)Mathf.RoundToInt((actualAngle % 360f) / 360f * (float)byte.MaxValue);
receiving
float actualAngle = receivedByte / (float)byte.MaxValue * 360f;
which would have a precision of about 1.4 degrees.
BUT honestly, is all this forth and back calculations really worth the 2/3 saved bytes?

Solving for Amplitude and Frequency in WAV files

I've also asked this here on the Sound Design forum, but the question is heavy computer science/math so it might actually belong on this forum:
So I'm able to successfully find all the information about a WAV file except the amplitude and the frequency (hertz) of the big sin function by reading the binaries in the file (which are unfortunately exactly what I'm looking for). Just to verify what I'm talking about, the file generates one wave only with the equation:
F(s) = A * sin(T * s)
Where s is the current sample, A is the amplitude and T is the period. Now the equation for the T (period) is:
T = (2π * Hz) /(α * ω)
Where Hz is frequency in Hertz, α is Samples per second, and ω is the amount of channels.
Now I know that to solve for amplitude, I could simply find the value of F(s) where
s = (π/2)/T
Because then the value of the sine function would be 1, and the final value would be equivalent to A. The problem is that to divide by T, I have to know the Hertz (or Hz).
Is there any way that I can read a WAV file to discover the Hertz from the data, assuming the file only contains a single wave.
Just to get some terms clarified, the property you're looking for is frequency, and the unit of frequency is Hertz (once per second). By convention, the typical A note has a frequency of 440 Hz.
You got the function wrong, actually. That sine wave in reality has the form F(s) = A * sin(2*pi*s/T + c) - you don't know when it started so you get a constant c in there. Also, you need to divide by T, not multiply.
Getting the amplitude is actually fairly easy. That sine wave has a series op peaks and valleys. Find each peak (higher than both neighbors) and each valley (lower), calculate the average peak and average valley, and the amplitude is TWICE the difference between the two. Pretty easy. The period T can be estimated by counting the average distance from peak to peak, and from valley to valley.
There's one bit where you need to be careful. If there is a slight bit of noise, you may get a slight dent near a peak. Instead of 14 17 18 17 14 you may get 14 17 16 17 14. That 16 isn't a valley. Once you've got a good estimate for the real peaks and valleys, throw out all the distorted peaks.
The question isn't "what frequency?". If your function is anything other than a simple trig function it'll be a combination of frequencies, each with their own amplitude.
The correct approach is digital signal processing using finite Fourier transform. You've got a lot of digging to do.
If you only want to assume a single trig function, you have just 2 (amplitude and frequency) or 3 degrees of freedom (amplitude, frequency, and phase angle) and N time points in the file. That means least squares fitting assuming a sine or cosine function.

Sound to 3 main frequencies(low, mid, high)

I have made some research but I couldn't find what I am exactly looking for. At the moment, I have to send channel values by com port.
For example:
the content of file freqs.ini
low=0-xx khz;
mid=xx-yy khz;
high=yy-zz khz;
Then I will get values by percentage like
the expecting values
lowPercent = 10;
midPercent = 77;
highPercent = 53;
So, I will be able to send these values by rs232 and my room will turn into club :) (I am using this code to illuminate LED strips). I have found some spectrum analyser projects but they all have 9 channels, that is, 3*3 combinations from low-low to high-high.
I know how to communicate with com port, but how can I get integer values of 3 frequency range I have set before?
I don't know if you still need of that but....
Do you want to know how to get a real-time spectral analsys of sound?
1.implement a queue to take a buffer of audio samples
2.take the product of buffer and a proper window function (tipically , Hamming or Hann) calculated by your program as float array
3.do FFT of yelded array: there are may algortihms out there for every language....find the best one for you, use it and take the square module from each output coefficent ( Real_part^2 + Imaginary_part^2 , if FFT returns you algebrical representation of coefficients)
sum coefficients across your bands: to know what coefficient is associated to a frequency you've just got to know that the k-th coefficient is at (SampFrequency/BufferLength)*k Hz.....so it's easy to find band boundaries
if you need to normalize in [0 , 1] interval, you have to do nothing but divide each of 3 yelded bands value for maximum value between the 3
pop your buffer queue by a Shift value that is Shift <= BufferLength and start again
the number of coefficients coming from FFT alg is equal to BufferLength (this is beacause the Discrete Fourier Transform definition) so, the frequency resolution is better when you select a long buffer, but the program goes slower. The light intensity wont' vary after BufferLength audio frames, buf after Shift audio frames.....and high ratio beetwen BufferLength gives you slowly light changes....so you must select parameters that fits your desires, remembering that you have just to turn on & off some light....make your alg fast and lo-fi!
The last thing to do is discover freqeuncy bands from your mixer's eq knobs....i don't remember if this information was on mixers handbooks

Extracting motion data from a list of coordinates

I have a series of CSV files of timestamped coordinates (X, Y, and Z in mm). What would be the simplest way to extract motion data from them?
Measurables
The information I'd like to extract includes the following:
Number of direction changes
Initial acceleration of the first and last movements
...and the bearing (angle) of these movements
Average speed whilst non-stationary
Ideally, I'd eventually like to be able to categorise patterns of motion, so bonus points for anyone who can suggest a way of doing this. It strikes me that one way I could do this would be to generate pictures/videos of the motion from the coordinates and ask humans to categorise them - suggestions as to how I'd do this are very welcome.
Noise
A complication is the fact that the readings are polluted with noise. In order to overcome this, each recording is prefaced with at least 20 seconds of stillness which can serve as a sort of "noise profile". I'm not sure how to implement this though.
Specifics
If it helps, the motion being recorded is that of a persons hand during a simple grabbing task. The data is generated using a magnetic motion tracker attached to the wrist. Also, I'm using C#, but I guess the maths is language agnostic.
Edits
Magnetic tracker spec: http://www.ascension-tech.com/realtime/RTminiBIRD500_800.php
Sample data file: http://tdwright.co.uk/sample.csv
Bounty
For the bounty, I'd really like to see some (pseudo-)code examples.
Let's see what can be done with your example data.
Disclaimer: I didn't read your hardware specs (tl;dr :))
I'll work this out in Mathematica for convenience. The relevant algorithms (not many) will be provided as links.
The first observation is that all your measurements are equally spaced in time, which is most convenient for simplifying the approach and algorithms. We will represent "time" or "ticks" (measurements) on our convenience, as their are equivalent.
Let's first plot your position by axis, to see what the problem is about:
(* This is Mathematica code, don't mind, I am posting this only for
future reference *)
ListPlot[Transpose#(Take[p1[[All, 2 ;; 4]]][[1 ;;]]),
PlotRange -> All,
AxesLabel -> {Style["Ticks", Medium, Bold],
Style["Position (X,Y,Z)", Medium, Bold]}]
Now, two observations:
Your movement starts around tick 1000
Your movement does not start at {0,0,0}
So, we will slightly transform your data subtracting a zero position and starting at tick 950.
ListLinePlot[
Drop[Transpose#(x - Array[Mean#(x[[1 ;; 1000]]) &, Length#x]), {}, 950],
PlotRange -> All,
AxesLabel -> {Style["Ticks", Medium, Bold],
Style["Position (X,Y,Z)", Medium, Bold]}]
As the curves have enough noise to spoil the calculations, we will convolve it with a Gaussian Kernel to denoise it:
kern = Table[Exp[-n^2/100]/Sqrt[2. Pi], {n, -10, 10}];
t = Take[p1[[All, 1]]];
x = Take[p1[[All, 2 ;; 4]]];
x1 = ListConvolve[kern, #] & /#
Drop[Transpose#(x - Array[Mean#(x[[1 ;; 1000]]) &, Length#x]), {},
950];
So you can see below the original and smoothed trajectories:
Now we are ready to take Derivatives for the Velocity and Acceleration. We will use fourth order approximants for the first and second derivative. We also will smooth them using a Gaussian kernel, as before:
Vel = ListConvolve[kern, #] & /#
Transpose#
Table[Table[(-x1[[axis, i + 2]] + x1[[axis, i - 2]] -
8 x1[[axis, i - 1]] +
8 x1[[axis, i + 1]])/(12 (t[[i + 1]] - t[[i]])), {axis, 1, 3}],
{i, 3, Length[x1[[1]]] - 2}];
Acc = ListConvolve[kern, #] & /#
Transpose#
Table[Table[(-x1[[axis, i + 2]] - x1[[axis, i - 2]] +
16 x1[[axis, i - 1]] + 16 x1[[axis, i + 1]] -
30 x1[[axis, i]])/(12 (t[[i + 1]] - t[[i]])^2), {axis, 1, 3}],
{i, 3, Length[x1[[1]]] - 2}];
And the we plot them:
Show[ListLinePlot[Vel,PlotRange->All,
AxesLabel->{Style["Ticks",Medium,Bold],
Style["Velocity (X,Y,Z)",Medium,Bold]}],
ListPlot[Vel,PlotRange->All]]
Show[ListLinePlot[Acc,PlotRange->All,
AxesLabel->{Style["Ticks",Medium,Bold],
Style["Acceleation (X,Y,Z)",Medium,Bold]}],
ListPlot[Acc,PlotRange->All]]
Now, we also have the speed and acceleration modulus:
ListLinePlot[Norm /# (Transpose#Vel),
AxesLabel -> {Style["Ticks", Medium, Bold],
Style["Speed Module", Medium, Bold]},
Filling -> Axis]
ListLinePlot[Norm /# (Transpose#Acc),
AxesLabel -> {Style["Ticks", Medium, Bold],
Style["Acceleration Module", Medium, Bold]},
Filling -> Axis]
And the Heading, as the direction of the Velocity:
Show[Graphics3D[
{Line#(Normalize/#(Transpose#Vel)),
Opacity[.7],Sphere[{0,0,0},.7]},
Epilog->Inset[Framed[Style["Heading",20],
Background->LightYellow],{Right,Bottom},{Right,Bottom}]]]
I think this is enough to get you started. let me know if you need help in calculating a particular parameter.
HTH!
Edit
Just as an example, suppose you want to calculate the mean speed when the hand is not at rest. so, we select all points whose speed is more than a cutoff, for example 5, and calculate the mean:
Mean#Select[Norm /# (Transpose#Vel), # > 5 &]
-> 148.085
The units for that magnitude depend on your time units, but I don't see them specified anywhere.
Please note that the cutoff speed is not "intuitive". You can search an appropriate value by plotting the mean speed vs the cutoff speed:
ListLinePlot[
Table[Mean#Select[Norm /# (Transpose#Vel), # > h &], {h, 1, 30}],
AxesLabel -> {Style["Cutoff Speed", Medium, Bold],
Style["Mean Speed", Medium, Bold]}]
So you see that 5 is an appropriate value.
e solution could be as simple as a state machine, where each state represents a direction. Sequences of movements are represented by sequences of directions. This approach would only work if the orientation of the sensor doesn't change with respect to the movements, otherwise you'll need a method of translating the movements into the correct orientation, before calculating sequences of directions.
On the other end, you could use various AI techniques, although exactly what you'd use is beyond me.
To get the speed between any two coordinates:
_________________________________
Avg Speed = /(x2-x1)^2 + (y2-y1)^2 + (z2-z1)^2
--------------------------------------
(t2-t1)
To get the average speed for the whole motion, say you have 100 timestamped coordinates, use the above equation to calculate 99 speed values. Then sum all the speeds, and divide by the number of speeds (99)
To get the acceleration, the location at three moments is required, or the velocity at two moments.
Accel X = (x3 - 2*x + x1) / (t3 - t2)
Accel Y = (y3 - 2*y + y1) / (t3 - t2)
Accel Z = (z3 - 2*z + z1) / (t3 - t2)
Note: This all assumes per axis calculations: I have no experience with two-axis particle motion.
You will have a much easier time with this if you first convert your position measurements to velocity measurements.
First step: Remove the noise. As you said, each recording is prefaced with 20 seconds of stillness. So, to find the actual measurements, search for 20 second intervals where the position doesn't change. Then, take the measurement directly after.
Second step: Calculate velocities using: (x2 - x1)/(t2 - t1); the slope formula. The interval should match the interval of the recordings.
Calculations:
Direction change:
A direction change occurs where the acceleration is zero. Use numeric integration to find these times. Integrate from 0 until a time when the result of the integration is zero. Record this time. Then, integrate from the previous time until you get zero again. Repeat until you hit the end of the data.
Initial accelerations:
These are found using the slope formula again, substituting v for x.
Average speed:
The average speed formula is the slope formula. x1 and t1 should correspond to the first reading, and x2 and t2 should correspond to the final reading.

How can find an peak in an array?

I am making a pitch detection program using fft. To get the pitch I need to find the lowest frequency that is significantly above the noise floor.
All the results are in an array. Each position is for a frequency. I don't have any idea how to find the peak.
I am programming in C#.
Here is a screenshot of the frequency analysis in audacity.
Instead of attempting to find the lowest peak, I would look for a fundamental frequency which maximizes the spectral energy captured by its first 5 integer multiples. Note that every peak is an integer multiple of the lowest peak. This is a hack of the cepstrum method. Don't judge :).
N.B. From your plots, I assume a 1024 sample window and 44.1kHZ sampling Rate. This yields a frequency granularity of only 44.1kHz/1024 = 43Hz. Given a 44.1kHz audio, I recommend using a longer analysis window of ~50 ms or 2048 samples. This would yield a finer frequency granularity of ~21 Hz.
Assuming a Matlab vector 'psd' of size 2048 with the PSD values.
% 50 Hz (Dude) -> 50Hz/44100Hz * 2048 -> ~2 Lower Lim
% 300 Hz (Baby) -> 300Hz/44100Hz * 2048 -> ~14 Upper Lim
lower_lim = 2;
upper_lim = 14
for fund_cand = lower_lim:1:upper_lim
i_first_five_multiples = [1:1:5]*fund_cand;
sum_energy = sum(psd(i_first_five_multiples));
end
I would find the frequency which maximizes the sum_energy value.
It would be easier if you had some notion of the absolute values to expect, but I would suggest:
find the lowest (weakest) value first. It is your noise level.
compute the average level, it is your signal strength
define some function to decide the noise threshold. This is the tricky part, it may require some experimentation.
In a bad situation, signal may be only 2 or 3 times the noise level. If the signal is better you can probably use a threshold of 2xnoise.
Edit, after looking at the picture:
You should probably just start at the left and find a local maximum. Looks like you could use 30 dB threshold and a 10-bin window or something.
Finding the lowest peak won't work reliably for estimating pitch, as this frequency is sometimes completely missing, or down in the noise floor. For better reliability, try another algorithm: autocorrelation (AMDF, ASDF lag), cepstrum (FFT log FFT), harmonic product spectrum, state space density, and variations thereof that use neural nets, genetic algorithms or decision matrices to decide between alternative pitch hypothesis (RAPT, YAAPT, et.al.).
Added:
That said, you could guess a frequency, compute the average and standard deviation of spectral magnitudes for, say, a 2-to-1 frequency range around your guess, and see if there exists a peak significantly above the average (2 sigma?). Rinse and repeat for some number of frequency guesses, and see which one, or the lowest of several, has a peak that stands out the most from the average. Use that peak.

Categories

Resources