Porting libmfcc to C# using Bass Library

Porting libmfcc to C# using Bass Library - c#

I am currently using The Bass Library for Audio Analysis which can calculate FFT and return it as an Array, libmfcc uses this Data to calculate the Value of the MFCC Coefficients which I need. (Info: MFCC is like a Audio Spectrum but it fits more the way how the Human Hearing and Frequency Scaling works)
The Bass Library returns Values from 0 to 1 as FFT Values.
Now I encountered several Problems and Questions:
Their FFT Example Data seems to have a different Format, Values are very high and the total of the 8192 FFT Values Sum to 10739.24 , how can that be?[/li]
In their example Application they call the Function like the following. Why they Use 128 as FFT Array Size if they just loaded 8192 Values?
Using their MFCC Class which I copied and edited a bit to match C# Syntax/Functions I get negative Values for some Coefficients, I dont think that should be the case.
Can anyone help me out why it is returning negative Values or what I did wrong ?
I made a simple example Ready to Try Program which does the described above and is useful for debugging.
Link: http://www.xup.in/dl,17603935/MFCC_Test.rar/
Output from my C# Application (Most likely not correct)
Coeff 16 = 0,017919318626506 Coeff 17 = -0,155580763009355 Coeff 18 =
-0,76072865841987 Coeff 19 = 0,108961510335727 Coeff 20 = 0,819025783804398 Coeff 21 = -0,660508603974514 Coeff 22 =
-0,951623924906163 Coeff 23 = 0,424922129906254 Coeff 24 = 0,0129727009313168 Coeff 25 = -0,388796833267654 Coeff 26 =
0,270839393161931 Coeff 27 = -0,138515788828431 Coeff 28 =
-0,454837674981149 Coeff 29 = -0,448629344922371 Coeff 30 = -0,11908663618393 Coeff 31 = 0,237500036702818 Coeff 32 = 0,114874386870208 Coeff 33 = -0,100822381384326 Coeff 34 =
0,144242143551012 Coeff 35 = 0,209338502838453 Coeff 36 =
0,247588420953066 Coeff 37 = -0,451654204112441 Coeff 38 =
0,0346927542067229 Coeff 39 = 0,180816031061584
Their example FFT Data (Different Format?)
14.524506
38.176063
10.673860
3.705076
2.102398
1.461585
1.145616
0.974108
0.878079
0.825304
0.798959
0.789067
0.789914
0.797102
0.808576
0.822048
0.836592
0.851101
0.864869
0.877625
0.888780
0.897852
0.905033
0.910054
0.912214
0.912414
0.909593
0.904497

I can answer the first part:
The sample code clearly states that the input data was computed using FFTW, which produces an unnormalized result. You need to divide by sqrt(n) to get the normalized values, which is what I suspect BASS returns.
Perhaps multiplying your inputs by sqrt(n) will give you better results.

The MFCC routine returns cepstral coefficients (DCT of the log of mel magnitudes), not mel magnitude values. Cepstral coefficients can be negative. I believe the value 128 in the example code is indeed a mistake by the author. In order to preserve the signal energy an FFT requires normalization at some point (either after FFT, iFFT or split between the two). In the example you're looking at the raw (unnormalized) magnitudes, which is why they are so large.

Related

What kind of Algorithm am I looking for to combine quantities?

I have been stuck on this problem now for 8 weeks and I think that I almost have a solution however the last bit of math is racking my mind. I will try to explain a simple problem that requires a complex solution. I am programing in C#.net MVC Web Project. Here is the situation.
I have an unknown group of quantities incoming to look for like items. Those like items share a max level to make it a full box. Here is an example of this:
Revision******
This is the real world case
I have many, let say candy, orders coming in to a company.
Qty Item MaxFill Sold-To DeliverNumber
60 candy#14 26 Joe 1
1 candy#12 48 Jim 2
30 candy#11 48 Jo 3
60 candy#15 48 Tom 4
6 candy#8 48 Kat 5
30 candy#61 48 Kim 6
44 candy#12 48 Jan 7
10 candy#12 48 Yai 8
10 candy#91 48 Jun 9
55 candy#14 26 Qin 10
30 candy#14 26 Yo 11
40 candy#14 26 Moe 12
in this list I am looking for like candy items to combine to make all the full boxes of candy that I can based off the MaxFill number. Here we see the like items are:
Qty Item MaxFill Sold-To DeliverNumber
60 candy#14 26 Joe 1
55 candy#14 26 Qin 10
30 candy#14 26 Yo 11
40 candy#14 26 Moe 12
1 candy#12 48 Jim 2
44 candy#12 48 Jan 7
10 candy#12 48 Yai 8
Now lets take the first set of numbers for candy#14.
I know that the total of candy#14 is 185 and I can get 7 full boxes of 26 with one box having only 3 in the last box. So how do I do this with the values that I have without losing the information of the original order. So this is how I am working it out right now
See below
End of Revision******
Like candy#14 max fill level is 26.
Like candy#14 quantities:
60
55
30
40
Now I already have a recursive function to break these down to the 26 level and is working fine. I feel that I need another recursive function to deal with the remainders that come out of this. As you can see most of the time there will be remainders from any given list but those remainders could total up to another full box of 26.
60 = 26+26+8
55 = 26+26+3
30 = 26+4
40 = 26+14
The 8,3,4,14 = 29 so I can get another 26 out of this. But in the real unknown world I could have the remainders come up with a new set of remainders that could repeat the same situation. To make this even more complicated I have to save the data that is originality with the 60,55,30,40 that is carried with it such as who it was sold to and delivery number. This will also be helpful with knowing how the original amount was broken down and combined together.
from the 8,3,4,14 the best way that I was think to add to that value is to take the 8,4,14 this will give me the 26 that I am looking for and I would not have to split any value because 3 is the remainder and I could save all other data without issue. However this just works in this situation only. If I go in a linear motion 8+3+4=15 so I would have to take 11 from the next value 14 with a remainder of 3.
In reading about different algorithms I was thinking that this might fall into the NP,NP-Complete,NP-Hard category. But with all the situations it is very technical and not a lot of real world scenarios are to be found.
Any suggestions would help here if I should go through the list of number to find the best combinations to reach the 26 or if the linear progression and splitting of the next value is the best solution. I know that I can solve to get how many full boxes I could get from the remainders and what the left over amount would be such as 8+3+4+14=29 which would give me 1, 26 and 1, 3 but I have no idea about the math in a recursive way to solve this. I have this much done and I "feel" that this is on the right track but can't see how to adjust to make this work with the linear or "test every possible combination".
public static void Main(string[] args)
{
var numbers = new List<int>() { 8, 3, 4, 14 };
var target = 26;
sum_up(numbers, target);
}
private static void sum_up(List<int> numbers, int target)
{
sum_up_recursive(numbers, target, new List<int>());
}
private static void sum_up_recursive(List<int> numbers, int target, List<int> partial)
{
int s = 0;
foreach (int x in partial) s += x;
if (s == target)
{
var outputtext = "sum(" + string.Join(",", partial.ToArray()) + ")=" + target;
}
if (s >= target)
return;
for (int i = 0; i < numbers.Count; i++)
{
List<int> remaining = new List<int>();
int n = numbers[i];
for (int j = i + 1; j < numbers.Count; j++) remaining.Add(numbers[j]);
List<int> partial_rec = new List<int>(partial);
partial_rec.Add(n);
sum_up_recursive(remaining, target, partial_rec);
}
}

I wrote sample project in javascript.
Please check my repo.
https://github.com/panghea/packaging_sample

Calculate the bit depth from the colour count of a gif

I've been reading though the gif specification trying to understand how the size of a colour table palette is calculated.
From the example on Wikipedia here
byte# hexadecimal text or
(hex) value Meaning
0: 47 49 46
38 39 61 GIF89a Header
Logical Screen Descriptor
6: 03 00 3 - logical screen width in pixels
8: 05 00 5 - logical screen height in pixels
A: F7 - GCT follows for 256 colors with
resolution 3 x 8 bits/primary
If you look at the 10th byte you can see the Hex F7which represents the decimal number 247.
Now I know from reading various code samples that this is a packed value made up from the following:
0x80 | // 1 : global color table flag = 1 (gct used)
0x70 | // 2-4 : color resolution
0x00 | // 5 : gct sort flag = 0
7 |; // 6-8 : gct size
0 |// background color index
0 |// pixel aspect ratio - assume 1:1
I've also determined that the size 7 represents the bit depth minus 1. which can be used to determine the number of colours.
2 ^ (0 + 1) = 4
2 ^ (1 + 1) = 4
2 ^ (2 + 1) = 8
2 ^ (3 + 1) = 16
2 ^ (5 + 1) = 64
2 ^ (6 + 1) = 128
2 ^ (7 + 1) = 256
http://www.matthewflickinger.com/lab/whatsinagif/bits_and_bytes.asp
http://www.devx.com/projectcool/Article/19997/0/page/7
What I am looking to find out is how would I calculate the bit depth from the number of colours using C#.
Since this is something you would want to do quickly I would imagine using some sort of bit-shifting mechanism would be the best approach. I'm not a computer scientist though so I struggle with such things.
I've a horrible feeling it's really simple...

I think you're looking for a logarithm. Round the result up in order to calculate the bits depth needed.
/// <summary>
/// Returns how many bits are required to store the specified
/// number of colors. Performs a Log2() on the value.
/// </summary>
/// <param name="colors"></param>
/// <returns></returns>
public static int GetBitsNeededForColorDepth(byte colors) {
return (int)Math.Ceiling(Math.Log(colors, 2));
}
https://github.com/imazen/resizer/blob/c4c586b58b2211ad0f48f7d8285e951ff6f262f9/Plugins/PrettyGifs/PrettyGifs.cs#L239-L241

Why does Mod operator returns the first number when the second number is larger than the first?

I hope I'm not making a stupid question but, I can't find any good explanation on this result:
35 % 36 is equal to 35
https://www.google.com.ph/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=35%20%25%2036
But if I divide the two numbers, 35 / 36 the result is: 0.97222222222 where I assume that the remainder would be 97.
Can anyone explain this?

When we divide 13 % 3 it gives 1 (the remainder)
similarly when we do 35 % 36 it will give the first number as remainder, as the dividend is less than the divider.
when you are dividing 35/36, integer division will give you 0 quotient.
Float division will give you the fraction value, and the fraction value is the remainder part.
13/3 = 4.33333 = 4 * 3 + (0.333)* 3
=(integer quotient) divider + remainder.

The result of 35 % 36 is equivalent to dividing 35 by 36, and returning the remainder. Since 36 goes into 35 exactly 0 times, you're left with a remainder of 35.
Similarly, let's assume you do 7 % 3. In this example, 3 goes into 7 twice and you're left with a remainder of 1. So 7 % 3 == 1.
I don't have the source code for the operation, but you could mimic it (I'm sure this isn't as efficient as whatever's built in!) with a small function like this:
public static class MyMath
{
public static int Mod(this int operand1, int operand2)
{
while (operand1 >= operand2)
operand1 -= operand2;
return operand1;
}
}
And then call it like this:
var remainder1 = 7.Mod(3); // 1
var remainder2 = 35.Mod(36); // 35

mod gives you the reminder of an integer division. 36 fits 0 times in 35 so
35 / 36 is 0 and has a reminder of 35 which is the result of mod

maximum value for type float in c#

when i do this:
float x = float.MaxValue;
I have the result: 3.40282347E+38
What is E+38? how can I represent the maximum number without this symbol?
msdn says RANGE: ±1.5 × 10^−45 to ±3.4 × 10^38, but that did not help me.

The "E+38" format is the default. If you'd like to see the whole number, specify a different format like this:
float.MaxValue.ToString("#")
This will result in:
340282300000000000000000000000000000000
Here are some additional formats: http://msdn.microsoft.com/en-us/library/system.globalization.numberformatinfo.aspx

This is called E-Notation (exponential notation), and is used with scientific notation.
From the E Notation section of the Wikipedia article on Scientific Notation:
Because super-scripted exponents like 10^7 cannot always be
conveniently displayed, the letter E or e is often used to represent
times ten raised to the power of (which would be written as "x 10^b")
and is followed by the value of the exponent.
So, 3.40282347E+38 equals 3.40282347 * 1038 and would be read "3.40282347 times 10 to the power of 38".

Try the following code:
float f = float.MaxValue;
Console.WriteLine("Origianl Value: " + f);
Console.WriteLine("With Zeros:" + f.ToString("0"));
Value
Origianl Value: 3.402823E+38
With Zeros:340282300000000000000000000000000000000

That is Scientific Notation.
5E+2 =
5 x 10 ^ 2 =
5 x 10 * 10 =
5 * 100 =
500
In other words, that's how many decimal places you move the decimal point to calculate the result. Take 5, move it over 2 places, end up with 500. In your example, you need to take your number, 3.40282347 and move the decimal place over 38 times!

It's approx. 340 000 000 000 000 000 000 000 000 000 000 000 000
If you use Dan's code, you'll get this as a result:
340282300000000000000000000000000000000

3.4e38 is 3.4 * 10^38 or 340000000000 ... (37 zeros)
additional information:
http://msdn.microsoft.com/en-us/library/b1e65aza(v=vs.71).aspx

The maximum number of the float without exponent is: 340282356779733661637539395458142568447.9f

Old question, but here's the min and max in string format.
Using float.Parse
-340282356779733642999999999999999999999 to 340282356779733642999999999999999999999
Using float.MinValue.ToString("#") and float.MaxValue.ToString("#")
-340282300000000000000000000000000000000 to 340282300000000000000000000000000000000
Using float.MinValue.ToString() and float.MaxValue.ToString()
-3.402823E+38 to 3.402823E+38

Sorry to necro an old thread but google lead me here and I didn't find a satisfactory answer. I'm sure google will lead someone else here.
The float.h library includes the maximum values for float and others in c. FLOAT_MAX is equal to 340282346638528859811704183484516925440, that is the maximum value that float can store.
I'm not an expert in C but I would imagine this value is universal and wouldn't depend on a x32 or x64 operating system.

The answers showing float.MaxValue using ToString are not correct.
Those answers state float.MaxValue is :
var f1 = 340282300000000000000000000000000000000f;
That is only is what C# produces when converting float.MaxValue to string.
However, float.MaxValue is defined in System.Single as 3.40282347E+38F
So the actual answer is:
var f2 = 340282347000000000000000000000000000000f;
Also note, a compile defect will allow constant values higher than this to be used as a float. However upon compliation, C# will trunacted those values to float.MaxValue. For example:
var f3 = 340282356699999999999999999999999999999f;
var f4 = 340282356760000000000000000000000000000f;
var f5 = 340282356779733661637539395458142568447.9f;
Here f5 (340282356779733661637539395458142568447.9f) is the actual maximum constant you can define. Again, this is truncated to float.MaxValue.
This can all be verified:
var equals_1 = f1 == f; // false
var equals_2 = f2 == float.MaxValue; // true
var equals_3 = f3 == float.MaxValue; // true
var equals_4 = f3 == float.MaxValue; // true
var equals_4 = f5 == float.MaxValue; // true

triggering an event with a certain probability with C#

I'm trying to simulate a realistic key press event. For that reason I'm using SendInput() method, but for greater result I need to specify the delay between keyDOWN and KeyUP events! These numbers below show the elapsed time in milliseconds between DOWN and UP events (these are real/valid):
96
95
112
111
119
104
143
96
95
104
120
112
111
88
104
119
111
103
95
104
95
127
112
143
144
142
143
128
144
112
111
112
120
128
111
135
118
147
96
135
103
64
64
87
79
112
88
111
111
112
111
104
87
95
We can simplify the output:
delay 64 - 88 ms -> 20% of a time
delay 89 - 135 ms -> 60% of a time
delay 136 - 150 ms -> 20 % of a time
How do I trigger an event according to probabilities from above? Here is the code I'm using right now:
private void button2_Click(object sender, EventArgs e)
{
textBox2.Focus();
Random r = new Random();
int rez = r.Next(0, 5); // 0,1,2,3,4 - five numbers total
if (rez == 0) // if 20% (1/5)
{
Random r2 = new Random();
textBox2.AppendText(" " + rez + " " + r2.Next(64, 88) + Environment.NewLine);
// do stuff
}
else if (rez == 4)//if 20% (1/5)
{
Random r3 = new Random();
textBox2.AppendText(" " + rez + " " + r3.Next(89, 135) + Environment.NewLine);
// do stuff
}
else // if 1 or 2 or 3 (3/5) -> 60%
{
Random r4 = new Random();
textBox2.AppendText(" " + rez + " " + r4.Next(136, 150) + Environment.NewLine);
// do stuff
}
}
There is a huge problem with this code. In theory, after millions of iterations - the resulting graph will look similar to this:
How to deal with this problem?
EDIT: the solution was to use distribution as people suggested.
here is java implementation of such code:
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Random.html#nextGaussian%28%29
and here is C# implementation:
How to generate normally distributed random from an integer range?
although I'd suggest to decrease the value of "deviations" a little.
here is interesting msdn article
http://blogs.msdn.com/b/ericlippert/archive/2012/02/21/generating-random-non-uniform-data-in-c.aspx
everyone thanks for help!

Sounds like you need to generate a normal distribution. The built-in .NET class generates a Uniform Distribution.
Gaussian or Normal distribution random numbers are possible using the built-in Random class by using the Box-Muller transform.
You should end up with a nice probability curve like this
(taken from http://en.wikipedia.org/wiki/Normal_distribution)
To transform a Normally Distributed random number into an integer range, the Box-Muller transform can help with this again. See this previous question and answer which describes the process and links to the mathematical proof.

This is the right idea, I just think you need to use doubles instead of ints so you can partition the probability space between 0 and 1. This will allow you to get a finer grain, as follows :
Normalise the real values by dividing all the values by the largest value
Divide the values into buckets - the more buckets, the closer the graph will be to the continuous case
Now, the larger the bucket the more chance of the event being raised. So, partition the interval [0,1] according to how many elements are in each bucket. So, if you have 20 real values, and a bucket has 5 values in it, it takes up a quarter of the interval.
On each test, generate a random number between 0-1 using Random.NextDouble() and whichever bucket the random number falls into, raise an event with that parameter. So for the numbers you provided, here are the values for 5 buckets buckets :
This is a bit much to put in a code example, but hopefully this gives the right idea

One possible approach would be to model the delays as an Exponential Distribution. The exponential distribution models the time between events that occur continuously and independently at a constant average rate - which sounds like a fair assumption given your problem.
You can estimate the parameter lambda by taking the inverse of the average of your real observed delays, and simulate the distribution using this approach, i.e.
delay = -Math.Log(random.NextDouble()) / lambda
However, looking at your sample, the data looks too "concentrated" around the mean to be a pure Exponential, so simulating that way would result in delays with the proper mean, but too spread out to match your sample.
One way to address that is to model the process as a shifted Exponential; essentially, the process is shifted by a value which represents the minimum the value can take, instead of 0 for an exponential. In code, taking the shift as the minimum observed value from your sample, this could look like this:
var sample = new List<double>()
{
96,
95,
112,
111,
119,
104,
143,
96,
95,
104,
120,
112
};
var min = sample.Min();
sample = sample.Select(it => it - min).ToList();
var lambda = 1d / sample.Average();
var random = new Random();
var result = new List<double>();
for (var i = 0; i < 100; i++)
{
var simulated = min - Math.Log(random.NextDouble()) / lambda;
result.Add(simulated);
Console.WriteLine(simulated);
}
A trivial alternative, which is in essence similar to Aidan's approach, is to re-sample: pick random elements from your original sample, and the result will have exactly the desired distribution:
var sample = new List<double>()
{
96,
95,
112,
111,
119,
104,
143,
96,
95,
104,
120,
112
};
var random = new Random();
var size = sample.Count();
for (var i = 0; i < 100; i++)
{
Console.WriteLine(sample[random.Next(0, size)]);
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Porting libmfcc to C# using Bass Library - c#

Related

What kind of Algorithm am I looking for to combine quantities?

Calculate the bit depth from the colour count of a gif

Why does Mod operator returns the first number when the second number is larger than the first?

maximum value for type float in c#

triggering an event with a certain probability with C#

Categories

Resources