The program uses vertices in R^2 as nodes and there are edges between nodes, and ultimately more is built from there. There are a high number of circuitous ways that a point (x,y) in R^2 might be reached that may rely on layer after layer of trig functions. So it makes sense to define one vertex as the canonical vertex for all points in a square with side length 2*epsilon centered at the point. So various calculations happen, out comes a point (x,y) where I wish to put a vertex, a vertex factory checks to see if there is already vertex deemed as canonical that should be used for this point, if so it returns that vertex, if not a new vertex is created with the coordinates of the point and that vertex is now deemed canonical. I know this can lead to ambiguities given the possibility for overlap of the squares but that is immaterial for the moment, epsilon can be set to make the probability of such a case vanishingly small.
Clearly a list of canonical vertices must be kept.
I have working implementations using List<Vertex> and HashSet<Vertex>, however the vertex creation process seems to scale poorly when the number of vertices grows to over over 100,000 and incredibly poorly if the number gets anywhere near 1,000,000.
I have no idea how to efficiently implement the VertexFactory. Right now Vertex has method IsEquivalentTo(Vertex v) and returns true if v is contained in the square around the instance vertex calling it, false otherwise.
So the the vertex creation process looks like this:
Some point (x,y) get calculated and requests a new vertex from the vertex manager.
VertexManager creates a temporary vertex temp then uses a foreach to iterate over every vertex in the container using IsEquivalentTo(temp) to find a match and return it, if no match is found then add temp to the container and return it. I should state that if a match is found obviously it breaks out of the foreach loop.
I may be way off but my first guess would be put an order on the vertices such as
v1 < v2 iff ( (v1.X < v2.X) || ( (v1.X == v2.X) && (v1.Y < v2.Y) ) )
and then store the vertices in a sorted container. But to be honest I do not know enough about about the various containers to know which is the most appropriate for the purpose, standard C# best practices, etc
Edit:
I cannot mark the comment as an answer so thanks to GSerg whose comment guided me to kd-trees. This is exactly what I was looking for.
Thanks
I am getting quite a few spikes when offsetting Polygons with the clipper library, this is unfortunately not acceptable in my use case and I have no idea how to get rid of it. I have tried all type of join type settings but could not achieve anything. Any help would be greatly appreciated.
My application layers a model and calculates the outline polygons. It then also has to offset the outlines. Layers with a lot of curves in then tend to get one or more spikes each such as this:
Now this does not seem to bad but once it happens to a lot of layers a model becomes like this:
It is important to note that without offsetting the outlines I get none of these spikes.
Here is a file containing the input polygons:
http://sdrv.ms/H7ysUC
Here is a file containing the output polygons:
http://sdrv.ms/1fLoZjT
The parameters for the operation were an offset operation with jtRound JointType with default limit. The delta was -25000. I have also tried all the other JoinTypes with limits ranging from 0 to 1000 but they all created the exact same spike. The other JoinTypes though had some other added strange effects.
OK, I can confirm there's a bug. It happens when adjacent polygon edges are almost collinear.
Here's the fix (that hasn't been heavily tested yet) at about line 4220 in clipper.cs
void OffsetPoint(JoinType jointype)
{
m_sinA = (normals[m_k].X * normals[m_j].Y - normals[m_j].X * normals[m_k].Y);
if (Math.Abs(m_sinA) < 0.00005) return; //ADD THIS LINE (todo - check this!)
else if (m_sinA > 1.0) m_sinA = 1.0;
else if (m_sinA < -1.0) m_sinA = -1.0;
Note: 0.00005 is just a value that's close enough to zero to remove the spike in your supplied sample, but it may need to be readjusted with further testing.
I'm writing an application that has a need to know the speed you're traveling. My application talks to several pieces of equipment, all with different built-in GPS receivers. Where the hardware I'm working with reports speed, I use that parameter. But in some cases, I have hardware which does NOT report speed, simply latitude and longitude.
What I have been doing in that case, is marking the time that I receive the first coordinate, then waiting for another coordinate to come in. I then calculation the distance traveled and divide by the elapsed time.
The problem I'm running into is that some of the hardware reports position quickly (5-10 times per second) while some reports position slowly (0.5 times per second). When I'm receiving the GPS position quickly, my algorithm fails to accurately calculate the speed due to the inherent inaccuracies of GPS receivers. In order words, the position will naturally move due to GPS inaccuracy, and since the elapsed time span from the last received position is so small, my algorithm thinks we've moved far over a short time - meaning we are going fast (when in reality we may be standing still).
How can I go about averaging the speed to avoid this problem? It seems like the process will have to be adaptive based on how fast the points come in. For example if I simply average the last 5 points collected to do my speed calculation, it will probably work great for "fast" reporting units but it will hurt my accuracy for "slow" reporting units.
Any ideas?
Use a simple filter:
Take a position only if it is more than 10 meters away from last taken position.
Then caluclate the distance between lastGood and thisGood, and divide by timeDiff.
Your further want to ignore all speeds under 5km/h were GPS is most noisy.
You further can optimize by calcuklating the direction between last and this, if it stays stable you take it. This helps filtering.
I would average the speed over the last X seconds. Let's pick X=3. For your fast reporters that means averaging your speed with about 20 data points. For your slow reporters, that may only get you 6 data points. This should keep the accuracy fairly even across the board.
I'd try using the average POSITION over the last X seconds.
This should "average out" the random noise associated with the high frequency location input....which should yield a better speed computation.
(Obviously you'd use "averaged" positions to compute your speed)
You probably have an existing data point structure to pull a linq query from?
In light of the note that we need to account for negative vectors, and the suggestion to account for known margins of error here is a more complex example:
class GPS
{
List<GPSData> recentData;
TimeSpan speedCalcZone = new TimeSpan(100000);
decimal acceptableError = .5m;
double CalcAverageSpeed(GPSData newestPoint)
{
var vectors = (from point in recentData
where point.timestamp > DateTime.Now - speedCalcZone
where newestPoint.VectorErrorMargin(point) < acceptableError
select new
{
xVector = newestPoint.XVector(point),
yVector = newestPoint.YVector(point)
});
var averageXVector = (from vector in vectors
select vector.xVector).Average();
var averageYVector = (from vector in vectors
select vector.yVector).Average();
var averagedSpeed = Math.Sqrt(Math.Pow(averageXVector, 2) + Math.Pow(averageYVector, 2));
return averagedSpeed;
}
}
But as pointed out in comments, there is no one magic algorithm, you have to tweak it for your circumstances and needs.
You're looking for one ideal algorithm that may not exist for one very simple reason: you can't invent data where there isn't any and some times you can't even tell where the data ends and error begins.
That being said there are ways to reduce the "noise" as you've discovered with averaging 5 consecutive measurements, I'd add to that you can throw away the "outliers" and choose 3 of the 5 that are closest to each-other.
The question here is what would work best (or acceptably well) for your situation. If you're tracking trucks moving around the continent a few mph won't matter as the errors will cancel themselves out, but if you're tracking a flying drone that moves between buildings the difference can be quite significant.
Here are some more ideas, you can pick and choose how far you can go, I'm assuming the truck scenario and the idea is to get most probable speed when you don't have an accurate reading:
- discard "improbable" speeds - tall buildings can reflect GPS signal causing speeds of over 100mph when you're just walking, having a "highway map" (see below) can help managing the cut-off value
- transmit, store and calculate with error ranges rather than point values (some GPS reports error range).
- keep average error per location
- keep average error per reporting device
- keep average speed per location, you'll end up having a map of highways vs other roads
- you can correlate location speed and direction
I would like to know the following:
How to effectively make initial generation of chromosomes with high diversity using value encoding ?
One way is grid initialization, but it is too slow.
Till now I have been using Random class from .NET for choosing random values in value encoding but, although values are uniformly distributed, fitness function values calculated from such chromosomes are not. Here is a code for Chromosome initialization:
public Chromosome(Random rand)
{
Alele = new List<double>();
for (int i = 0; i < ChromosomeLength; i++)
{
Alele.Add(rand.NextDouble() * 2000 - 1000);
}
}
So, I developed a function that calculates fitness from new, randomly made chromosome (upper code) and if fitness is similar to any other already in the list of chromosomes, a new chromosome is made randomly and his fitness is calculated and this process is repeated until his fitness is not different enough from those already in the list.
Here is the code for this part:
private bool CheckSimilarFitnes(List<Chromosome> chromosome, Chromosome newCandidate)
{
Boolean flag=false;
double fitFromList, fitFromCandidate;
double fitBigger,fitSmaller;
foreach (var listElement in chromosome)
{
fitFromList = listElement.CalculateChromosomeFitness(listElement.Alele);
fitFromCandidate = newCandidate.CalculateChromosomeFitness(newCandidate.Alele);
fitBigger = fitFromList >= fitFromCandidate ? fitFromList : fitFromCandidate;
fitSmaller = fitFromList < fitFromCandidate ? fitFromList : fitFromCandidate;
if ((fitFromList / fitFromCandidate) < 1.5)
return false
}
else return true;
}
But, the more chromosomes I have in the list it takes longer to add a new one, with fitness that is enough different from others already in there.
So, is there a way to make this grid initialization more faster, it takes days to make 80 chromosomes like this?
here's some code that might help (which I just wrote): GA for ordering 10 values spaced by 1.0. It starts with a population of 100 completely random alleles, which is exactly how your code starts.
The goal I gave the GA to solve was to order the values in increasing order with a separation of 1.0. It does this in the fitness function Eval_OrderedDistance by by computing the standard deviation of each pair of samples from 1.0. As the fitness tends toward 0.0, the alleles should start to appear in sequential order.
Generation 0's fittest Chromosome was completely random, as were the rest of the Chromosomes. You can see the fitness value is very high (i.e., bad):
GEN: fitness (allele, ...)
0: 375.47460 (583.640, -4.215, -78.418, 164.228, -243.982, -250.237, 354.559, 374.306, 709.859, 115.323)
As the generations continue, the fitness (standard deviation from 1.0) decreases until it's nearly perfect in generation 100,000:
100: 68.11683 (-154.818, -173.378, -170.846, -193.750, -198.722, -396.502, -464.710, -450.014, -422.194, -407.162)
...
10000: 6.01724 (-269.681, -267.947, -273.282, -281.582, -287.407, -293.622, -302.050, -307.582, -308.198, -308.648)
...
99999: 0.67262 (-294.746, -293.906, -293.114, -292.632, -292.596, -292.911, -292.808, -292.039, -291.112, -290.928)
The interesting parts of the code are the fitness function:
// try to pack the aleles together spaced apart by 1.0
// returns the standard deviation of the samples from 1.0
static float Eval_OrderedDistance(Chromosome c) {
float sum = 0;
int n = c.alele.Length;
for(int i=1; i<n; i++) {
float diff = (c.alele[i] - c.alele[i-1]) - 1.0f;
sum += diff*diff; // variance from 1.0
}
return (float)Math.Sqrt(sum/n);
}
And the mutations. I used a simple crossover and a "completely mutate one allele":
Chromosome ChangeOne(Chromosome c) {
Chromosome d = c.Clone();
int i = rand.Next() % d.alele.Length;
d.alele[i] = (float)(rand.NextDouble()*2000-1000);
return d;
}
I used elitism to always keep one exact copy of the best Chromosome. Then generated 100 new Chromosomes using mutation and crossover.
It really sounds like you're calculating the variance of the fitness, which does of course tell you that the fitnesses in your population are all about the same. I've found that it's very important how you define your fitness function. The more granular the fitness function, the more you can discriminate between your Chromosomes. Obviously, your fitness function is returning similar values for completely different chromosomes, since your gen 0 returns a fitness variance of 68e-19.
Can you share your fitness calculation? Or what problem you're asking the GA to solve? I think that might help us help you.
[Edit: Adding Explicit Fitness Sharing / Niching]
I rethought this a bit and updated my code. If you're trying to maintain unique chromosomes, you have to compare their content (as others have mentioned). One way to do this would be to compute the standard deviation between them. If it's less than some threshold, you can consider them the same. From class Chromosome:
// compute the population standard deviation
public float StdDev(Chromosome other) {
float sum = 0.0f;
for(int i=0; i<alele.Length; i++) {
float diff = other.alele[i] - alele[i];
sum += diff*diff;
}
return (float)Math.Sqrt(sum);
}
I think Niching will give you what you'd like. It compares all the Chromosomes in the population to determine their similarity and assigns a "niche" value to each. The chromosomes are then "penalized" for belonging to a niche using a technique called Explicit Fitness Sharing. The fitness values are divided by the number of Chromosomes in each niche. So if you have three in niche group A (A,A,A) instead of that niche being 3 times as likely to be chosen, it's treated as a single entity.
I compared my sample with Explicit Fitness Sharing on and off. With a max STDDEV of 500 and Niching turned OFF, there were about 18-20 niches (so basically 5 duplicates of each item in a 100 population). With Niching turned ON, there were about 85 niches. Thats 85% unique Chromosomes in the population. In the output of my test, you can see the diversity after 17000 generations.
Here's the niching code:
// returns: total number of niches in this population
// max_stddev -- any two chromosomes with population stddev less than this max
// will be grouped together
int ComputeNiches(float max_stddev) {
List<int> niches = new List<int>();
// clear niches
foreach(var c in population) {
c.niche = -1;
}
// calculate niches
for(int i=0; i<population.Count; i++) {
var c = population[i];
if( c.niche != -1) continue; // niche already set
// compute the niche by finding the stddev between the two chromosomes
c.niche = niches.Count;
int count_in_niche = 1; // includes the curent Chromosome
for(int j=i+1; j<population.Count; j++) {
var d = population[j];
float stddev = c.StdDev(d);
if(stddev < max_stddev) {
d.niche = c.niche; // same niche
++count_in_niche;
}
}
niches.Add(count_in_niche);
}
// penalize Chromosomes by their niche size
foreach(var c in population) {
c.niche_scaled_fitness = c.scaled_fitness / niches[c.niche];
}
return niches.Count;
}
[Edit: post-analysis and update of Anton's code]
I know this probably isn't the right forum to address homework problems, but since I did the effort before knowing this, and I had a lot of fun doing it, I figure it can only be helpful to Anton.
Genotip.cs, Kromosom.cs, KromoMain.cs
This code maintains good diversity, and I was able in one run to get the "raw fitness" down to 47, which is in your case the average squared error. That was pretty close!
As noted in my comment, I'd like to try to help you in your programming, not just help you with your homework. Please read these analysis of your work.
As we expected, there was no need to make a "more diverse" population from the start. Just generate some completely random Kromosomes.
Your mutations and crossovers were highly destructive, and you only had a few of them. I added several new operators that seem to work better for this problem.
You were throwing away the best solution. When I got your code running with only Tournament Selection, there would be one Kromo that was 99% better than all the rest. With tournament selection, that best value was very likely to be forgotten. I added a bit of "elitism" which keeps a copy of that value for the next generation.
Consider object oriented techniques. Compare the re-write I sent you with my original code.
Don't duplicate code. You had the sampling parameters in two different classes.
Keep your code clean. There were several unused parts of code. Especially when submitting questions to SO, try to narrow it down, remove unused code, and do some cleaning up.
Comment your code! I've commented the re-work significantly. I know it's Serbian, but even a few comments will help someone else understand what you are doing and what you intended to do.
Overall, nice job implementing some of the more sophisticated things like Tournament Selection
Prefer double[] arrays instead of List. There's less overhead. Also, several of your List temp variables weren't even needed. Your structure
List temp = new List();
for(...) {
temp.add(value);
}
for(each value in temp) {
sum += value
}
average = sum / temp.Count
can easily be written as:
sum = 0
for(...) {
sum += value;
}
average = sum / count;
In several places you forgot to initialize a loop variable, which could have easily added to your problem. Something like this will cause serious problems, and it was in your fitness code along with one or two other places
double fit = 0;
for(each chromosome) {
// YOU SHOULD INITIALIZE fit HERE inside the LOOP
for(each allele) {
fit += ...;
}
fit /= count;
}
Good luck programming!
The basic problem here is that most randomly generated chromosomes have similar fitness, right? That's fine; the idea isn't for your initial chromosomes to have wildly different fitnesses; it's for the chromosomes themselves to be different, and presumably they are. In fact, you should expect the initial fitness of most of your first generation to be close to zero, since you haven't run the algorithm yet.
Here's why your code is so slow. Let's say the first candidate is terrible, basically zero fitness. If the second one has to be 1.5x different, that really just means it has to be 1.5x better, since it can't really get worse. Then the next one has to 1.5x better than that, and so on up to 80. So what you're really doing is searching for increasingly better chromosomes by generating completely random ones and comparing them to what you have. I bet if you logged the progress, you'd find it takes more and more time to find the subsequent candidates, because really good chromosomes are hard to find. But finding better chromosomes is what the GA is for! Basically what you've done is optimize some of the chromosomes by hand before, um, actually optimizing them.
If you want to ensure that your chromosomes are diverse, compare their content, don't compare their fitness. Comparing the fitness is the algo's job.
I'm going to take a quick swing at this, but Isaac's pretty much right. You need to let the GA do its job. You have a generation of individuals (chromosomes, whatever), and they're all over the scale on fitness (or maybe they're all identical).
You pick some good ones to mutate (by themselves) and crossover (with each other). You maybe use the top 10% to generate another full population and throw out the bottom 90%. Maybe you always keep the top guy around (Elitism).
You iterate at this for a while until your GA stops improving because the individuals are all very much alike. You've ended up with very little diversity in your population.
What might help you is to 1) make your mutations more effective, 2) find a better way to select individuals to mutate. In my comment I recommended AI Techniques for Game Programmers. It's a great book. Very easy to read.
To list a few headings from the book, the things you're looking for are:
Selection techniques like Roulette Selection (on stackoveflow) (on wikipedia) and Stochastic Universal Sampling, which control how you select your individuals. I've always liked Roulette Selection. You set the probabilities that an individual will be selected. It's not just simple white-noise random sampling.
I used this outside of GA for selecting 4 letters from the Roman alphabet randomly. I assigned a value from 0.0 to 1.0 to each letter. Every time the user (child) would pick the letter correctly, I would lower that value by, say 0.1. This would increase the likelihood that the other letters would be selected. If after 10 times, the user picked the correct letter, the value would be 0.0, and there would be (almost) no chance that letter would be presented again.
Fitness Scaling techniques like Rank Scaling, Sigma Scaling, and Boltzmann Scaling (pdf on ftp!!!) that let you modify your raw fitness values to come up with adjusted fitness values. Some of these are dynamic, like Boltzmann Scaling, which allows you to set a "pressure" or "temperature" that changes over time. Increased "pressure" means that fitter individuals are selected. Decreased pressure means that any individual in the population can be selected.
I think of it this way: you're searching through multi-dimensional space for a solution. You hit a "peak" and work your way up into it. The pressure to be fit is very high. You snug right into that local maxima. Now your fitness can't change. Your mutations aren't getting you out of the peak. So you start to reduce the pressure and just, oh, select items randomly. Your fitness levels start to drop, which is okay for a while. Then you start to increase the pressure again, and surprise! You've skipped out of the local maxima and found a lovely new local maxima to climb into. Increase the pressure again!
Niching (which I've never used, but appears to be a way to group similar individuals together). Say you have two pretty good individuals, but they're wildly different. They keep getting selected. They keep mutating slightly, and not getting much better. Now you have half your population as minor variants of A, and half your population minor variants of B. This seems like a way to say, hey, what's the average fitness of that entire group A? and what for B? And what for every other niche you have. Then do your selection based on the average fitness for each niche. Pick your niche, then select a random individual from that niche. Maybe I'll start using this after all. I like it!
Hope you find some of that helpful!
If you need true random numbers for your application, I recommend you check out Random.org. They have a free HTTP API, and clients for just about every language.
The randomness comes from atmospheric noise, which for many purposes is better than the pseudo-random number algorithms typically used in computer programs.
(I am unaffiliated with Random.org, although I did contribute the PHP client).
I think your problem is in how your fitness function and how you select candidates, not in how random values are. Your filtering feels too strict that may not even allow enough elements to be accepted.
Sample
values: random float 0-10000.
fitness function square root(n)
desired distribution of fitness - linear with distance at least 1.
With this fitness function you will quickly get most of the 1-wide "spots" taken (as you have at most 100 places), so every next one will take longer. At some point there will be several tiny ranges left and most of the results will simply rejected, even worse after you get about 50 numbers places there is a good chance that next one simply will not be able to fit.
I am using .NET 2.0 so I don't have access to the nifty Linq; however, I have worked up some code that sorts a list of points clockwise/counterclockwise.
My problem is that the sort works perfectly fine if the list is not already sorted, but if for some reason the list is already sorted the sort function fails miserably.
I was wandering if someone could help point me in the right direction as to why this may be the cause.
Here is my call to the sort:
positions.Sort(new Comparison<Position>(MapUtils.SortCornersClockwise));
And here is the SortCornersClockwise function:
public static int SortCornersClockwise( Position A, Position B)
{
// Variables to Store the atans
double aTanA, aTanB;
// Reference Point
Pixels reference = Program.main.reference;
// Fetch the atans
aTanA = Math.Atan2(A.Pixel.Y - reference.Y, A.Pixel.X - reference.X);
aTanB = Math.Atan2(B.Pixel.Y - reference.Y, B.Pixel.X - reference.X);
// Determine next point in Clockwise rotation
if (aTanA < aTanB) return -1;
else if (aTanB > aTanA) return 1;
return 0;
}
Where my reference is the point from which I determine the respective angle to each point in my list of points.
Now say I have a list of points:
15778066, 27738237
15778169, 27738296
15778185, 27738269
15778082, 27738210
These are already sorted in the correct order but calling the sort function yields:
15778082, 27738210
15778066, 27738237
15778185, 27738269
15778169, 27738296
Now take another set of sample points:
15778180, 27738255
15778081, 27738192
15778064, 27738219
15778163, 27738282
This list is not already in the correct order and calling the sort function yields:
15778064, 27738219
15778081, 27738192
15778180, 27738255
15778163, 27738282
Which is sorted correctly. This pattern repeats it self for each set of coordinates that are already sorted and for those that aren't. Any ideas?
if (aTanA < aTanB) return -1;
else if (aTanB > aTanA) return 1;
These two conditions are the same! Either swap the variables or turn around the inequality sign, but not both.
In addition to the excellent observation that Henning made, you may also be surprised by the output once you fix that problem.
The circular symmetry makes this a somewhat unusual sorting algorithm when compared with normal sorts on totally ordered domains. Since there is circular symmetry, there are n valid clockwise orderings, given n points and it may matter to you which of those orderings you prefer.
As it stands, use of atan2 unmodified will result in a cut-line with coordinates (x,0) where x<0.
Consider coordinates A = (-10, 1) and B = (-10, -1), measured relative to the reference. You probably imagine that A should appear before B in the result, but in fact it will be the other way around since atan2 returns just less than π for A, but just more than -π for B.
If you want a cut-line with coordinates (x,0) where x>0 simply add 2π to any negative values returned from atan2.
Shouldn't that be:
else if(aTanA > aTanB) return 1;
Not sure if that would cause your issue though.