How to produce a detailed spectrogram from Fourier output? - c#

I am developing a little application in Visual Studio 2010 in C# to draw a spectrogram (frequency "heat map").
I have already done the basic things:
Cut a rectangular windowed array out of the input signal array
Feed that array into FFT, which returns complex values
Store magnitude values in an array (spectrum for that window)
Step the window, and store the new values in other arrays, resulting in a jagged array that holds every step of windowing and their spectra
Draw these into a Graphics object, in color that uses the global min/max values of the heat map as relative cold and hot
The LEFT side of the screenshot shows my application, and on the RIGHT there is a spectrogram for the same input (512 samples long) and same rectangular window with size 32 from a program called "PAST - time series analysis" (https://folk.uio.no/ohammer/past/index.html). My 512 long sample array only consists of integer elements ranging from around 100 to 1400.
(Note: the light-blue bar on the very right of the PAST spectrogram is only because I accidentally left an unnecessary '0' element at the end of thats input array. Otherwise they are the same.)
Link to screenshot: https://drive.google.com/open?id=1UbJ4GyqmS6zaHoYZCLN9c0JhWONlrbe3
But I have encountered a few problems here:
The spectrogram seems very undetailed, related to another one that I made in "PAST time series analysis" for reference, and that one looks extremely detailed. Why is that?
I know that for an e.g. 32 long time window, the FFT returns 32 elements, the 0. elem is not needed here, the next 32/2 elements have the magnitude values I need. But this means that the frequency "resolution" on the output for a 32 long window is 16. That is exactly what my program uses. But "PAST" program shows a lot more detail. If you look at the narrow lines in the blue background, you can see that they show a nice pattern in the frequency axis, but in my spectrogram that information remains unseen. Why?
In the beginning (windowSize/2) wide window step-band and the ending (windowSize/2) step-band, there are less values for FFT input, thus there is less output, or just less precision. But in the "PAST" program those parts also seem relatively detailed, not just stretched bars like in mine. How can I improve that?
The 0. element of the FFT return array (the so called "DC" element) is a huge number, which is a lot bigger than the sample average, or even its sum. Why is that?
Why are my values (e.g. the maximum that you see near the color bar) so huge? That is just a magnitude value from the FFT output. Why are there different values in the PAST program? What correction should I use on the FFT output to get those values?
Please share your ideas, if you know more about this topic. I am very new to this. I only read first about Fourier transform a little more than a week ago.
Thanks in advance!

To get more smoothness in the vertical axis, zero pad your FFT so that there are more (interpolated) frequency bins in the output. For instance, zero pad your 32 data points so that you can use a 256 point or larger FFT.
To get more smoothness in the horizontal axis, overlap your FFT input windows (75% overlap, or more).
For both, use a smooth window function (Hamming or Von Hann, et.al.), and try wider windows, more than 32 (thus even more overlapped).
To get better coloring, try using a color mapping table, with the input being the log() of the (non zero) magnitudes.
You can also use multiple different FFTs per graph XY point, and decide which to color with based on local properties.

Hello LimeAndConconut,
Even though I do not know about PAST, I can provide you with some general information about FFT. Here is an answer for each of your points
1- You are right, a FFT performed on 32 elements returns 32 frequencies (null frequency, positive and negative components). It means that you already have all the information in your data, and PAST cannot get more information with the same 32 sized window. That's why I suspect the data to be interpolated for plotting, but this just visual. Once again PAST cannot create more information than the one you have in your data.
2- Once again I agree with you. On the borders, you have access to less frequency components. You can decide different strategies: not show data at the borders, or extend this data with zero-padding or circular padding
3- The zero element of the FFT should be the sum of your 32 windowed array. You need to check FFT normalization, have a look at the documentation of your FFT function.
4- Once again check the FFT normalization. Since PAST colorbar exhibit negative values, it seems to be plotted in logarithmic scale. This is common usage to use logarithm for plotting data with high dynamics in order to enhance details.

Related

Stretch noise value meanwhile keep it in range (math issue!)

I implemented a simplex noise algorithm (by KdotJPG: OpenSimplex2S) which works fine, but I'd like to add a "function" which can increase/decrease the contrast of the noise. The noise method returns a value between -1 and 1 but the overall result is quite homogeneous. It is not bad at all, but I need to get a different outcome now.
So basically I should "pull" the value of the noise toward the range edges.. this will result more contrasting noise (more distance between the smaller and bigger values). Of course this change must be consistent and proportionally scaled between -1 and 1 (or 0-1) to get natural result.
Actually this is pure mathematical issue, but I'm not good in math at all! I'd like to make it more understandable to give this picture of two graphs:
So, on these graph the Y axis is the noise value (-1 is bottom and +1 it the top) and X axis is the time passed. The left graph shows the original result of the noise generator, and the right is the stretched version what I need to get. As you can see on the right graph everything the same but their values stretched/pulled toward the edge (toward the min, max limit) but still in range.
Is there any math formula or c# built in function to stretch the return value of the noise proportionally respect to the min, max values (-1/1 or 0/1)? If you need the code of the noise you can see it here OpenSimplex2S too, but this is irrelevant in my case, as I just wish to modify its return value. Thanks!

How do you calculate a rectangular wave from input values?

Here's a picture to make it a little easier:
The blue line represents some input values that resemble waves with variable amplitudes and lengths. The y axis represents the values, the x axis represents time. Please note that there is quite some jitter in the wave. However, every wave has a certain minimum and maximum length.
The green line shows how the input values should be transformed.
Please note: The above picture is just a hand drawn example to explain the task. In an ideal case, the position of the rising and falling edges of the rectangular (green) wave are close to the blue waves average value. The height/amplitude of the green wave segments should match the values of the blue wave.
How do you calculate the green line?
Do you know of any C# libraries or algorithms to do that? I guess this could be a rather common task for electrical engineers, so there are most likely some common approaches available. If so, how are hey called?
How would you approach this requirements?
Any advice that helps in getting started is welcome.
Take a base frequency (f) at an amplitude (a).
Then add ODD harmonics with the inverse amplitude ie f * a + f3 * a/3 + f5 * a/5 + f7 * a/7 ...
This will tend towards a square wave as you add harmonics.
BTW Try doing the same with even harminics, and with all the harmonics - Great fun!!!
Good luck
Tony

Genetical algorithms. How to find the optimal size of the population

How to find the optimal size of the population. In my task, each gene is a value of type int lying in a given range.
For example:
The chromosome consists of 2 genes.
The first gene maybe contains a int value in the range from 5 to 15
The second gene maybe contains a int value from 15 to 25.
The question. How to find the size of the initial population.
Usually optimal size is found iteratively through trial and error. You can write a simple algorithm to optimize population size, start for example with pop size of 100 and iterativly increase it by e.g. 50. For each step you need to run GA and calculate some measure that will assess population size, you can use one of these: maximum fitness, average fitness, time till convergence criteria is met. To increase accuracy you should repeat each step at least few times, after that calculate average in each step and draw chart from which you can choose optimal pop size or if it's not enough you can optimize closely peak doing the same thing near this pop size.
Depending on your problem the chart will look different. If it's just a positive slope curve, then you will have to choose on your own reasonable pop size. With too small pop size your GA will most likely loose diversity and perhaps fall to some local optimum. When it's too big then your GA will become simple random search algorithm.
Btw I hope this example is far from your real problem, because genetic algorithms are not the best choice for such small chromosomes.

Volume from byte array

I'm new to audio analysis, but need to perform a (seemingly) simple task. I have a byte array containing a 16 bit recording (single channel) and a sample rate of 44100. How do I perform a quick analysis to get the volume at any given moment? I need to calculate a threshold, so a function to return true if it's above a certain amplitude (volume) and false if not. I thought I could iterate through the byte array and check its value, with 255 being the loudest, but this doesn't seem to work as even when I don't record anything, background noise gets in and some of the array is filled with 255. Any suggestions would be great.
Thanks
As you have 16-bit data, you should expect the signal to vary between -32768 and +32767.
To calculate the volume you can take intervals of say 1000 samples, and calculate their RMS value. Sum the squared sample values divide by 1000 and take the square root. check this number against you threshold.
Typically one measures the energy of waves using root mean square.
If you want to be more perceptually accurate you can take the time-domain signal through a discrete fourier transform to a frequency-domain signal, and integrate over the magnitudes with some weighting function (since low-frequency waves are perceptually louder than high-frequency waves at the same energy).
But I don't know audio stuff either so I'm just making stuff up. ☺
I might try applying a standard-deviation sliding-window. OTOH, I would not have assumed that 255 = loudest. It may be, but I'd want to know what encoding is being used. If any compression is present, then I doubt 255 is "loudest."

C# Date/Numerical Axis Scale

I'm developing a histogram container class and I'm trying to determine where the cut off points should be for the bins. I'd like the cutoff points to be nice looking numbers, in much that same way that graphs are scaled.
To distill my request into a basic question: Is there a basic method by which data axis labels can be determined from a list of numbers.
For example:
Array{1,6,8,5,12,15,22}
It would make sense to have 5 bins.
Bin Start Count
0 1
5 3
10 2
15 0
20 1
The bin start stuff is identical to selecting axis labels on a graph in this instance.
For the purpose of this question I don't really care about bins and the histogram, I'm more interested in the graph scale axis label portion of the question.
I will be using C# 4.0 for my app, so nifty solution using linq are welcome.
I've attempted stuff like this in the distant past using some log base 10 scaling stuff, but I never got it to work in great enough detail for this application. I don't want to do log scaling, I just used base 10 to round to nearest whole numbers. I'd like it to work for large numbers and very small numbers and possibly dates too; although dates can be converted to doubles and parsed that way.
Any resources on the subject would be greatly appreciated.
You could start with something simple:
NUM_BINS is a passed argument or constatn (e.g. NUM_BINS = 10)
x is your array of x-values (e.g. int[] x = new int[50])
int numBins = x.Length < NUM_BINS ? x.Length : NUM_BINS;
At this point you could calc a histogram of xPoints, and if the xPoints are heavily weighted to one side of distribution (maybe just count left of midpoint vs. right of midpoint), then use log/exp divisions over range of x[]. If the histogram is flat, use linear divisions.
double[] xAxis = new double[numBins];
double range = x[x.Length-1] - x[0];
CalcAxisValues(xAxis, range, TYPE); //Type is enum of LOG, EXP, or LINEAR
This function would then equally space points based on the TYPE.

Categories

Resources