AdaBoost repeatedly chooses same weak learners

AdaBoost repeatedly chooses same weak learners - c#

I have implemented a version of the AdaBoost boosting algorithm, where I use decision stumps as weak learners. However often I find that after training the AdaBoost algorithm, a series of weak learners is created, such that this series is recurring in the whole set. For example, after training, the set of weak learners looks like A,B,C,D,E,D,E,D,E,D,E,F,E,D,E,D,E etc.
I believe I am updating the weights of the data properly after each assignment of a new weak learner. Here I classify each data point and then set the weight of this data point.
// After we have chosen the weak learner which reduces the weighted sum error by the most, we need to update the weights of each data point.
double sumWeights = 0.0f; // This is our normalisation value so we can normalise the weights after we have finished updating them
foreach (DataPoint dataP in trainData) {
int y = dataP.getY(); // Where Y is the desired output
Object[] x = dataP.getX();
// Classify the data input using the weak learner. Then check to see if this classification is correct/incorrect and adjust the weights accordingly.
int classified = newLearner.classify(x);
dataP.updateWeight(y, finalLearners[algorithmIt].getAlpha(), classified);
sumWeights += dataP.getWeight();
}
Here is my classify method in the WeakLearner class
// Method in the WeakLearner class
public int classify(Object[] xs) {
if (xs[splitFeature].Equals(splitValue))
return 1;
else return -1;
}
Then I have a method which updates the weight of a DataPoint
public void updateWeight(int y, double alpha, int classified) {
weight = (weight * (Math.Pow(e, (-y * alpha * classified))));
}
And I'm not sure why this is happening, are there any common factors why the same weak learners would generally be chosen?

You could increase the value of alpha and check. Maybe, not enough weight is being given to the misclassified samples, hence ,they are showing up again and again.

Related

2D array with four infinite directions

I want to create a 2D map of tiles. Example:
Cell[,] cells;
for(int x = 0; x < columns; x++)
{
for(int y = 0; y < rows; y++)
{
cells[x, y] = new Cell();
}
}
The first cell would be at (0|0). What if I want to have this cell as my center and create new cells on the left and top side? These cells would have negative indices.
One way to fix this would be a value that determines the maximum length of one direction. Having a map of 100 tiles per side would place the center of the map at (50|50).
Let's say there would be no hardware limitations and no maximum length per side, what is the best way to create a 2D map with a (0|0) center? I can't image a better way than accessing a cell by its x and y coordinate in a 2D array.

Well, Arrays are logical constructs, not physical ones.
This means that the way we look at the the 0,0 as the top left corner, while might help visualize the content of a 2-D array (and in fact, a 2-D array is also somewhat of a visualization aid), is not accurate at all - the 0,0 "cell" is not a corner, and indexes are not coordinates, though it really helps to understand them when you think about them like they are.
That being said, there is nothing stopping you from creating your own class, that implement an indexer that can take both positive and negative values - in fact, according to Indexers (C# Programming Guide) -
Indexers do not have to be indexed by an integer value; it is up to you how to define the specific look-up mechanism.
Since you are not even obligated to use integers, you most certainly can use both positive and negative values as your indexer.
I was testing an idea to use a list of lists for storage and dynamically calculate the storage index based on the class indexer, but it's getting too late here and I guess I'm too tired to do it right. It's kinda like the solution on the other answer but I was attempting to do it without making you set the final size in the constructor.

Well, you can't use negative indices in an array or list, they're just not the right structure for a problem like this... You could, however, write your own class that handles something like this.
Simply pass in the size of the grid into the constructor, and then use the index operator to return a value based off of an an adjusted index... Something like this... Wrote it up really fast, so it probably isn't ideal in terms of optimization.
public class Grid<T> {
T[,] grid { get; }
int adjustment { get; }
int FindIndex(int provided) {
return provided + adjustment;
}
public Grid(int dimension) {
if (dimension <= 0)
throw new ArgumentException("Grid dimension cannot be <= 0");
if (dimension % 2 != 0)
throw new ArgumentException("Grid must be evenly divisible");
adjustment = dimension / 2;
grid = new T[dimension, dimension];
}
public T this[int key, int key2] {
get {
return grid[FindIndex(key), FindIndex(key2)];
}
set {
grid[FindIndex(key), FindIndex(key2)] = value;
}
}
}
I used these to test it:
var grid = new Grid<int>(100);
grid[-50, -50] = 5;
grid[0, 1] = 10;
You can just switch it to:
var grid = new Grid<Cell>(100);
This only works for a grid with equal dimensions... If you need separate dimensions, you'll need to adjust the constructor and the FindIndex method.
I think that an infinitely sized grid would be dangerous. If you increase the size to the right, you'd have to reposition the center.. Which means, what you think will be at 0,0 will now be shifted as the grid is no longer properly centered.
Additionally, performance of such a structure would be a nightmare as you cannot rely on an array to be infinite (as it inherently isn't). So you'd either have to continuously copy the array (like how a list works) or use a linked list.. If using a linked list, you would have to do enormous amounts of iteration to get whatever value you want.

Analyzing the object position using C# generic list

I am computing the distance of an object.
The X and Y position values first stored in two different Lists X and W.
Then I use another List for storing the distance covered by this object. Also, I refresh the lists if their count reaches 10, in order to avoid memory burden.
On the basis of distance value, I have to analyze, if the object is in the static position the distance should not increases. And on the text box display, the computed distance values appears to be static.
Actually, I am using sensors to compute the distance. And due to sensor error even if the object is in the static state the distance value varies. The sensor error threshold is about to be 15cm.
I have developed the logic, However, I receive error:
System.ArgumentOutOfRangeException: 'Index was out of range. Must be non-negative and less than the size of the collection. Parameter name: index'
My code is as follows:
void distance()
{
List<double> d = new List<double>();
double sum = 0, sum1 = 0;
for (int i = 1; i < X.Count; i++)
{
//distance computation
if ((d[i] - d[i -1]) > 0.15)
{
sum1 = d.Sum();
sum = sum1 + dis1;
Dis = Math.Round(sum, 3);
}
}
// refresh the Lists when X, W and d List reach the count of 10
}
}

You do it totally wrong. Come up with a method computing a distance for a two given points. That's gonna be a func of signature double -> double -> double or, if you prefer C#, double ComputeDistance(double startPoint, double endPoint).
Then the only thing to do is to apply such a fuction to each pair of points you got. The easiest and most compact way to accomplish that is by means of Linq. It could be done in a regular foreach as well.
Take a note that it would be a way clearer if you will eventually merge your separated lists into a single list. Tuple<double, double> seems to be the best choice including performance.

How to get the fundamental frequency using Harmonic Product Spectrum?

I'm trying to get the pitch from the microphone input. First I have decomposed the signal from time domain to frequency domain through FFT. I have applied Hamming window to the signal before performing FFT. Then I get the complex results of FFT. Then I passed the results to Harmonic product spectrum, where the results get downsampled and then multiplied the downsampled peaks and gave a value as a complex number. Then what should I do to get the fundamental frequency?
public float[] HarmonicProductSpectrum(Complex[] data)
{
Complex[] hps2 = Downsample(data, 2);
Complex[] hps3 = Downsample(data, 3);
Complex[] hps4 = Downsample(data, 4);
Complex[] hps5 = Downsample(data, 5);
float[] array = new float[hps5.Length];
for (int i = 0; i < array.Length; i++)
{
checked
{
array[i] = data[i].X * hps2[i].X * hps3[i].X * hps4[i].X * hps5[i].X;
}
}
return array;
}
public Complex[] Downsample(Complex[] data, int n)
{
Complex[] array = new Complex[Convert.ToInt32(Math.Ceiling(data.Length * 1.0 / n))];
for (int i = 0; i < array.Length; i++)
{
array[i].X = data[i * n].X;
}
return array;
}
I have tried to get the magnitude using,
magnitude[i] = (float)Math.Sqrt(array[i] * array[i] + (data[i].Y * data[i].Y));
inside the for loop in HarmonicProductSpectrum method. Then tried to get the maximum bin using,
float max_mag = float.MinValue;
float max_index = -1;
for (int i = 0; i < array.Length / 2; i++)
if (magnitude[i] > max_mag)
{
max_mag = magnitude[i];
max_index = i;
}
and then I tried to get the frequency using,
var frequency = max_index * 44100 / 1024;
But I was getting garbage values like 1248.926, 1205,859, 2454.785 for the A4 note (440 Hz) and those values don't look like harmonics of A4.
A help would be greatly appreciated.

I implemented harmonic product spectrum in Python to make sure your data and algorithm were working nicely.
Here’s what I see when applying harmonic product spectrum to the full dataset, Hamming-windowed, with 5 downsample–multiply stages:
This is just the bottom kilohertz, but the spectrum is pretty much dead above 1 KHz.
If I chunk up the long audio clip into 8192-sample chunks (with 4096-sample 50% overlap) and Hamming-window each chunk and run HPS on it, this is the matrix of HPS. This is kind of a movie of the HPS spectrum over the entire dataset. The fundamental frequency seems to be quite stable.
The full source code is here—there’s a lot of code that helps chunk the data and visualize the output of HPS running on the chunks, but the core HPS function, starting at def hps(…, is short. But it has a couple of tricks in it.
Given the strange frequencies that you’re finding the peak at, it could be that you’re operating on the full spectrum, from 0 to 44.1 KHz? You want to only keep the “positive” frequencies, i.e., from 0 to 22.05 KHz, and apply the HPS algorithm (downsample–multiply) on that.
But assuming you start out with a positive-frequency-only spectrum, take its magnitude properly, it looks like you should get reasonable results. Try to save out the output of your HarmonicProductSpectrum to see if it’s anything like the above.
Again, the full source code is at https://gist.github.com/fasiha/957035272009eb1c9eb370936a6af2eb. (There I try out another couple of spectral estimator, Welch’s method from Scipy and my port of the Blackman-Tukey spectral estimator. I’m not sure if you are set on implementing HPS or if you would consider other pitch estimators, so I’m leaving the Welch/Blackman-Tukey results there.)
Original I wrote this as a comment but had to keep revising it because it was confusing so here’s it as a mini-answer.
Based on my brief reading of this intro to HPS, I don’t think you’re taking the magnitudes correctly after you find the four decimated responses.
You want:
array[i] = sqrt(data[i] * Complex.conjugate(data[i]) *
hps2[i] * Complex.conjugate(hps2[i]) *
hps3[i] * Complex.conjugate(hps3[i]) *
hps4[i] * Complex.conjugate(hps4[i]) *
hps5[i] * Complex.conjugate(hps5[i])).X;
This uses the sqrt(x * Complex.conjugate(x)) trick to find x’s magnitude, and then multiplies all 5 magnitudes.
(Actually, it moves the sqrt outside the product, so you only do one sqrt, saves some time, but gives the same result. So maybe that’s another trick.)
Final trick: it takes that result’s real part because sometimes due to float accuracy issues, a tiny imaginary component, like 1e-15, survives.
After you do this, array should contain just real floats, and you can apply the max-bin-finding.
If there’s no Conjugate method, then the old-fashioned way should work:
public float mag2(Complex c) { return c.X * c.X + c.Y * c.Y; }
// in HarmonicProductSpectrum
array[i] = sqrt(mag2(data[i]) * mag2(hps2[i]) * mag2(hps3[i]) * mag2(hps4[i]) * mag2(hps5[i]));
There’s algebraic flaws with the two approaches you suggested in the comments below, but the above should be correct. I’m not sure what C# does when you assign a Complex to a float—maybe it uses the real component? I’d have thought that’d be a compiler error, but with the above code, you’re doing the right thing with the complex data, and only assigning a float to array[i].

To get a pitch estimate, you have to divide your sumed bin frequency estimate by the downsampling ratio used for that sum.
Added: You should also sum the magnitudes (abs()), not take the magnitude of the complex sum.
But the harmonic product spectrum algorithm (HPS), especially when using only integer ratios of downsampling, doesn't usually provide better pitch estimation resolution. Instead, it provides a more robust rough pitch estimate (less likely to be fooled by a harmonic) than using a single bare FFT magnitude peak for sequential overtone rich timbres that have weak or missing fundamental spectral content.
If you know how to downsample a spectrum by fractional ratios (using interpolation, etc.), you can try finer grained downsampling to get a better pitch estimate out of HPS. Or you can use an HPS result to inform you of a narrower frequency range in which to search using another pitch or frequency estimation method.

"Matrix decomposition" of a matrix with holonic sub structure

Before I start, I must say that for those with a background of linear algebra, this is NOT matrix decomposition as you know it. Please read the following paragraphs to get a clearer understanding of the problem I am trying to solve.
Here are the salient properties/definitions of the matrix and its submatrices:
I have an SxP matrix which forms a grid like structure of S.P "boxes". This is the main matrix.
This is what the (empty) main matrix looks like. Each square in the matrix is simply referred to as a box. The matrix can be viewed as a a kind of "gameboard" e.g. a chess board. The vertical axis is measured using an interval scale (i.e. real numbers), and the horizontal axis is measured using monotonically increasing non-negative integers.
There is an additional concept of submatrices (as explained earlier). A submatrix is simply a collection of boxes in a particular configuration, and with specific numbers and piece types (see black and white pieces below), assigned to the boxes. I have a finite set of these sub matrices - which I refer to as my lexicon or vocabulary for carrying out valid matrix composition/decompositions.
The "formal" definition of a sub matrix is that it is a configuration of M boxes contained within the main matrix, that satisfy the criteria:
1 <=M<= 4
the "gap" G (i.e. distance) between any two adjacent boxes satisfies: 1<= G<= 2*(vertical units).
A vertical unit is the gap between the vertical axis lines in the main matrix. In the image below, the vertical unit is 100.
The image immediately above illustrates a simple sub matrix addition. The units with orange boarders/boxes are sub matrices - the recognized units that form part of my lexicon. You will notice that I have introduced further annotation in my sub matrices. This is because (using the chess analogy), I have two types of pieces I can use on the board. B means a black piece, and W (not shown in the image), represents a white piece. A recognized unit (or lexeme/sub matrix) There is a simple equivalence relation that allows conversion between a white piece and a black piece. This relationship can be used to further decompose a submatrix to use either exclusively black pieces, white pieces or a combination of both.
For the sake of simplicity, I have omitted specifying the equivalence relationship. However, if someone feels that the problem as posed is not "too difficult" without additional details, I shall gladly broaden the scope. For now, I am trying to keep things as simple as possible, to avoid confusing people with "information overload".
Each box in a sub matrix contains a signed integer, indicating a number of units of an item. Each "configuration" of boxes (along with its signed integers and piece type i.e. black or white pieces) is said to be a "recognized unit".
Submatrices can be placed in the main matrix in a way such that they overlap. Wherever the "boxes" overlap, the number of units in the resulting submatrix box is the sum of the number of units in the constituent boxes (as illustrated in the second image above).
The problem becomes slightly difficult because, the "recognized units" defined above themselves are sometimes combined with other "recognized units" to form another "recognized unit" - i.e. the sub matrices (i.e.recognized units) are "holons". For example, in the second image above, the recognized unit being added to the matrix can itself be further decomposed into "smaller" submatrices.
This sort of holarchy is similar to how (in Physical chemistry), elements form compounds, which then go on to form ever more complicated compounds (amino acids, proteins etc).
Back to our problem, given a main matrix M, I want to be able to do the following:
i. identify the submatrices (or recognized units) that are contained within the main matrix. This is the first "matrix decomposition". (Note: a submatrix has to satisfy the criteria given above)
ii. For each identified submatrix, I want to be able to recognize whether it can be decomposed further into 2 or more recognized submatrices. The idea is to iteratively decompose submatrices found in step i above, until either a specified hierarchy level is reached, or until we have a finite set of submatrices that can not be decomposed further.
I am trying to come up with an algorithm to help me do (i) and (ii) above. I will implement the logic in either C++, Python or C# (in increasing level of preference), depending on which ever is the easiest to do and/or in which I happen to get snippets to get me started in implementing the algorithm.

I am not sure if i have a understand correctly the problem.
So first ypu want to find all submatrixes that conform with your 2 criterea.
Thats like a graph decomposition problem or a set coverage problem i think, where you can have a recursive function and iterate the matrix to find all available submatrixes.
enum PieceTypes
{
White,
Black
}
class Box
{
public PieceTypes PieceType { get; set; }
public uint Units { get; set; }
public int s, p;
public Box(PieceTypes piecetype, uint units)
{
PieceType = piecetype;
Units = units;
}
}
class Matrix
{
public Box[,] Boxes;
public int Scale, S, P, MaxNum, MaxDist;
public List<List<Box>> Configurations;
public Matrix(int s, int p, int scale, int maxnum, int maxdist)
{
S = s;
P = p;
Scale = scale;
Boxes = new Box[S, P];
MaxNum = maxnum;
MaxDist = maxdist;
Configurations = new List<List<Box>>();
}
public void Find(List<Box> Config, int s, int p)
{
// Check the max number thats valid for your configuration
// Check that the current p and s are inside matrix
if (Config.Count() < MaxNum && s >= 0 && s < S && p >= 0 && p < P)
{
foreach (Box b in Config)
{
if (Valid(b, Boxes[s, p]))
{
Boxes[s, p].s = s;
Boxes[s, p].p = p;
Config.Add(Boxes[s, p]);
break;
}
}
Find(Config, s + 1, p);
Find(Config, s - 1, p);
Find(Config, s, p + 1);
Find(Config, s, p - 1);
}
if (Config.Count() > 0) Configurations.Add(Config);
Config.Clear();
}
public bool Valid(Box b1, Box b2)
{
// Create your dist funtion here
// or add your extra validation rules like the PieceType
if (Math.Sqrt((b1.s - b2.s) ^ 2 + (b1.p - b2.p) ^ 2) <= MaxDist && b1.PieceType == b2.PieceType) return true;
else return false;
}
}
I haven't used the best data structures and i have simplified the solution. I hope its some way helpful.

How to simulate a harmonic oscillator driven by a given signal (not driven by sine wave)

I've got a table of values telling me how the signal level changes over time and I want to simulate a harmonic oscillator driven by this signal. It does not matter if the simulation is not 100% accurate.
I know the frequency of the oscillator.
I found lots of formulas but they all use a sine wave as driver.

I guess you want to perform some time-discrete simulation. The well-known formulae require analytic input (see Green's function). If you have a table of forces at some point in time, the typical analytical formulae won't help you too much.
The idea is this: For each point in time t0, the oscillator has some given acceleration, velocity, etc. Now a force acts on it -according to the table you were given- which will change it's acceleration (F = m * a). For the next time step t1, we assume the acceleration stays at that constant, so we can apply simple Newtonian equations (v = a * dt) with dt = (t1-t0) for this time frame. Iterate until the desired range in time is simulated.
The most important parameter of this simulation is dt, that is, how fine-grained the calculation is. For example, you might want to have 10 steps per second, but that completely depends on your input parameters. What we're doing here, in essence, is an Eulerian integration of the equations.
This, of course, isn't all there is - such simulations can be quite complicated, esp. in not-so-well behaved cases where extreme accelerations, etc. In those cases you need to perform numerical sanity checks within a frame, because something 'extreme' happens in a single frame. Also some numerical integration might become necessary, e.g. the Runge-Kutta algorithm. I guess that leads to far at this point, however.
EDIT: Just after I posted this, somebody posted a comment to the original question pointing to the "Verlet Algorithm", which is basically an implementation of what I described above.

http://en.wikipedia.org/wiki/Simple_harmonic_motion
http://en.wikipedia.org/wiki/Hooke's_Law
http://en.wikipedia.org/wiki/Euler_method

Ok, i finally figured it out and wrote a gui app to test it until it worked. But my pc is not very happy with doing it 1000*44100 times per second, even without gui^^
Whatever: here is my test code (wich worked quite well):
double lastTime;
const double deltaT = 1 / 44100.0;//length of a frame in seconds
double rFreq;
private void InitPendulum()
{
double freq = 2;//frequency in herz
rFreq = FToRSpeed(freq);
damp = Math.Pow(0.8, freq * deltaT);
}
private static double FToRSpeed(double p)
{
p *= 2;
p = Math.PI * p;
return p * p;
}
double damp;
double bHeight;
double bSpeed;
double lastchange;
private void timer1_Tick(object sender, EventArgs e)
{
double now=sw.ElapsedTicks/(double)Stopwatch.Frequency;
while (lastTime+deltaT <= now)
{
bHeight += bSpeed * deltaT;
double prevSpeed=bSpeed;
bSpeed += (mouseY - bHeight) * (rFreq*deltaT);
bSpeed *= damp;
if ((bSpeed > 0) != (prevSpeed > 0))
{
Console.WriteLine(lastTime - lastchange);
lastchange = lastTime;
}
lastTime += deltaT;
}
Invalidate();//No, i am not using gdi^^
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.