c# linear regression given 2 sets of data - c#

I have 2 sets of data - one is an average position and the other a score so for every position, i have the predicted score of an item -
double[] positions = {0.1,0.2,0.3,0.45,0.46,...};
double[] scores = {1,1.2,1.5,2.2,3.4,...};
I need to create a function that predicts the score for average position, so given a new item with position 1.7.
I under stand the function should be something like y=a*x + b but how do i get to it?
Any help will be appreciated!

Yes, you have to build a linear function
y = a * x + b
in order to do this you have to compute the sums (x is predictor's values and y - is corresponding results):
sx - sum of x's
sxx - sum of x * x
sy - sum of y's
sxy - sum of x * y
So
a = (N * sxy - sx * sy) / (N * sxx - sx * sx);
b = (sy - a * sx) / N;

Related

How to enumerate x^2 + y^2 = z^2 - 1 (with additional constraints)

Lets N be a number (10<=N<=10^5).
I have to break it into 3 numbers (x,y,z) such that it validates the following conditions.
1. x<=y<=z
2. x^2+y^2=z^2-1;
3. x+y+z<=N
I have to find how many combinations I can get from the given numbers in a method.
I have tried as follows but it's taking so much time for a higher number and resulting in a timeout..
int N= Int32.Parse(Console.ReadLine());
List<String> res = new List<string>();
//x<=y<=z
int mxSqrt = N - 2;
int a = 0, b = 0;
for (int z = 1; z <= mxSqrt; z++)
{
a = z * z;
for (int y = 1; y <= z; y++)
{
b = y * y;
for (int x = 1; x <= y; x++)
{
int x1 = b + x * x;
int y1 = a - 1;
if (x1 == y1 && ((x + y + z) <= N))
{
res.Add(x + "," + y + "," + z);
}
}
}
}
Console.WriteLine(res.Count());
My question:
My solution is taking time for a bigger number (I think it's the
for loops), how can I improve it?
Is there any better approach for the same?
Here's a method that enumerates the triples, rather than exhaustively testing for them, using number theory as described here: https://mathoverflow.net/questions/29644/enumerating-ways-to-decompose-an-integer-into-the-sum-of-two-squares
Since the math took me a while to comprehend and a while to implement (gathering some code that's credited above it), and since I don't feel much of an authority on the subject, I'll leave it for the reader to research. This is based on expressing numbers as Gaussian integer conjugates. (a + bi)*(a - bi) = a^2 + b^2. We first factor the number, z^2 - 1, into primes, decompose the primes into Gaussian conjugates and find different expressions that we expand and simplify to get a + bi, which can be then raised, a^2 + b^2.
A perk of reading about the Sum of Squares Function is discovering that we can rule out any candidate z^2 - 1 that contains a prime of form 4k + 3 with an odd power. Using that check alone, I was able to reduce Prune's loop on 10^5 from 214 seconds to 19 seconds (on repl.it) using the Rosetta prime factoring code below.
The implementation here is just a demonstration. It does not have handling or optimisation for limiting x and y. Rather, it just enumerates as it goes. Play with it here.
Python code:
# https://math.stackexchange.com/questions/5877/efficiently-finding-two-squares-which-sum-to-a-prime
def mods(a, n):
if n <= 0:
return "negative modulus"
a = a % n
if (2 * a > n):
a -= n
return a
def powmods(a, r, n):
out = 1
while r > 0:
if (r % 2) == 1:
r -= 1
out = mods(out * a, n)
r /= 2
a = mods(a * a, n)
return out
def quos(a, n):
if n <= 0:
return "negative modulus"
return (a - mods(a, n))/n
def grem(w, z):
# remainder in Gaussian integers when dividing w by z
(w0, w1) = w
(z0, z1) = z
n = z0 * z0 + z1 * z1
if n == 0:
return "division by zero"
u0 = quos(w0 * z0 + w1 * z1, n)
u1 = quos(w1 * z0 - w0 * z1, n)
return(w0 - z0 * u0 + z1 * u1,
w1 - z0 * u1 - z1 * u0)
def ggcd(w, z):
while z != (0,0):
w, z = z, grem(w, z)
return w
def root4(p):
# 4th root of 1 modulo p
if p <= 1:
return "too small"
if (p % 4) != 1:
return "not congruent to 1"
k = p/4
j = 2
while True:
a = powmods(j, k, p)
b = mods(a * a, p)
if b == -1:
return a
if b != 1:
return "not prime"
j += 1
def sq2(p):
if p % 4 != 1:
return "not congruent to 1 modulo 4"
a = root4(p)
return ggcd((p,0),(a,1))
# https://rosettacode.org/wiki/Prime_decomposition#Python:_Using_floating_point
from math import floor, sqrt
def fac(n):
step = lambda x: 1 + (x<<2) - ((x>>1)<<1)
maxq = long(floor(sqrt(n)))
d = 1
q = n % 2 == 0 and 2 or 3
while q <= maxq and n % q != 0:
q = step(d)
d += 1
return q <= maxq and [q] + fac(n//q) or [n]
# My code...
# An answer for https://stackoverflow.com/questions/54110614/
from collections import Counter
from itertools import product
from sympy import I, expand, Add
def valid(ps):
for (p, e) in ps.items():
if (p % 4 == 3) and (e & 1):
return False
return True
def get_sq2(p, e):
if p == 2:
if e & 1:
return [2**(e / 2), 2**(e / 2)]
else:
return [2**(e / 2), 0]
elif p % 4 == 3:
return [p, 0]
else:
a,b = sq2(p)
return [abs(a), abs(b)]
def get_terms(cs, e):
if e == 1:
return [Add(cs[0], cs[1] * I)]
res = [Add(cs[0], cs[1] * I)**e]
for t in xrange(1, e / 2 + 1):
res.append(
Add(cs[0] + cs[1]*I)**(e-t) * Add(cs[0] - cs[1]*I)**t)
return res
def get_lists(ps):
items = ps.items()
lists = []
for (p, e) in items:
if p == 2:
a,b = get_sq2(2, e)
lists.append([Add(a, b*I)])
elif p % 4 == 3:
a,b = get_sq2(p, e)
lists.append([Add(a, b*I)**(e / 2)])
else:
lists.append(get_terms(get_sq2(p, e), e))
return lists
def f(n):
for z in xrange(2, n / 2):
zz = (z + 1) * (z - 1)
ps = Counter(fac(zz))
is_valid = valid(ps)
if is_valid:
print "valid (does not contain a prime of form\n4k + 3 with an odd power)"
print "z: %s, primes: %s" % (z, dict(ps))
lists = get_lists(ps)
cartesian = product(*lists)
for element in cartesian:
print "prime square decomposition: %s" % list(element)
p = 1
for item in element:
p *= item
print "complex conjugates: %s" % p
vals = p.expand(complex=True, evaluate=True).as_coefficients_dict().values()
x, y = vals[0], vals[1] if len(vals) > 1 else 0
print "x, y, z: %s, %s, %s" % (x, y, z)
print "x^2 + y^2, z^2-1: %s, %s" % (x**2 + y**2, z**2 - 1)
print ''
if __name__ == "__main__":
print f(100)
Output:
valid (does not contain a prime of form
4k + 3 with an odd power)
z: 3, primes: {2: 3}
prime square decomposition: [2 + 2*I]
complex conjugates: 2 + 2*I
x, y, z: 2, 2, 3
x^2 + y^2, z^2-1: 8, 8
valid (does not contain a prime of form
4k + 3 with an odd power)
z: 9, primes: {2: 4, 5: 1}
prime square decomposition: [4, 2 + I]
complex conjugates: 8 + 4*I
x, y, z: 8, 4, 9
x^2 + y^2, z^2-1: 80, 80
valid (does not contain a prime of form
4k + 3 with an odd power)
z: 17, primes: {2: 5, 3: 2}
prime square decomposition: [4 + 4*I, 3]
complex conjugates: 12 + 12*I
x, y, z: 12, 12, 17
x^2 + y^2, z^2-1: 288, 288
valid (does not contain a prime of form
4k + 3 with an odd power)
z: 19, primes: {2: 3, 3: 2, 5: 1}
prime square decomposition: [2 + 2*I, 3, 2 + I]
complex conjugates: (2 + I)*(6 + 6*I)
x, y, z: 6, 18, 19
x^2 + y^2, z^2-1: 360, 360
valid (does not contain a prime of form
4k + 3 with an odd power)
z: 33, primes: {17: 1, 2: 6}
prime square decomposition: [4 + I, 8]
complex conjugates: 32 + 8*I
x, y, z: 32, 8, 33
x^2 + y^2, z^2-1: 1088, 1088
valid (does not contain a prime of form
4k + 3 with an odd power)
z: 35, primes: {17: 1, 2: 3, 3: 2}
prime square decomposition: [4 + I, 2 + 2*I, 3]
complex conjugates: 3*(2 + 2*I)*(4 + I)
x, y, z: 18, 30, 35
x^2 + y^2, z^2-1: 1224, 1224
Here is a simple improvement in Python (converting to the faster equivalent in C-based code is left as an exercise for the reader). To get accurate timing for the computation, I removed printing the solutions themselves (after validating them in a previous run).
Use an outer loop for one free variable (I chose z), constrained only by its relation to N.
Use an inner loop (I chose y) constrained by the outer loop index.
The third variable is directly computed per requirement 2.
Timing results:
-------------------- 10
1 solutions found in 2.3365020751953125e-05 sec.
-------------------- 100
6 solutions found in 0.00040078163146972656 sec.
-------------------- 1000
55 solutions found in 0.030081748962402344 sec.
-------------------- 10000
543 solutions found in 2.2078349590301514 sec.
-------------------- 100000
5512 solutions found in 214.93411707878113 sec.
That's 3:35 for the large case, plus your time to collate and/or print the results.
If you need faster code (this is still pretty brute-force), look into Diophantine equations and parameterizations to generate (y, x) pairs, given the target value of z^2 - 1.
import math
import time
def break3(N):
"""
10 <= N <= 10^5
return x, y, z triples such that:
x <= y <= z
x^2 + y^2 = z^2 - 1
x + y + z <= N
"""
"""
Observations:
z <= x + y
z < N/2
"""
count = 0
z_limit = N // 2
for z in range(3, z_limit):
# Since y >= x, there's a lower bound on y
target = z*z - 1
ymin = int(math.sqrt(target/2))
for y in range(ymin, z):
# Given y and z, compute x.
# That's a solution iff x is integer.
x_target = target - y*y
x = int(math.sqrt(x_target))
if x*x == x_target and x+y+z <= N:
# print("solution", x, y, z)
count += 1
return count
test = [10, 100, 1000, 10**4, 10**5]
border = "-"*20
for case in test:
print(border, case)
start = time.time()
print(break3(case), "solutions found in", time.time() - start, "sec.")
The bounds of x and y are an important part of the problem. I personally went with this Wolfram Alpha query and checked the exact forms of the variables.
Thanks to #Bleep-Bloop and comments, a very elegant bound optimization was found, which is x < n and x <= y < n - x. The results are the same and the times are nearly identical.
Also, since the only possible values for x and y are positive even integers, we can reduce the amount of loop iterations by half.
To optimize even further, since we compute the upper bound of x, we build a list of all possible values for x and make the computation parallel. That saves a massive amount of time on higher values of N but it's a bit slower for smaller values because of the overhead of the parallelization.
Here's the final code:
Non-parallel version, with int values:
List<string> res = new List<string>();
int n2 = n * n;
double maxX = 0.5 * (2.0 * n - Math.Sqrt(2) * Math.Sqrt(n2 + 1));
for (int x = 2; x < maxX; x += 2)
{
int maxY = (int)Math.Floor((n2 - 2.0 * n * x - 1.0) / (2.0 * n - 2.0 * x));
for (int y = x; y <= maxY; y += 2)
{
int z2 = x * x + y * y + 1;
int z = (int)Math.Sqrt(z2);
if (z * z == z2 && x + y + z <= n)
res.Add(x + "," + y + "," + z);
}
}
Parallel version, with long values:
using System.Linq;
...
// Use ConcurrentBag for thread safety
ConcurrentBag<string> res = new ConcurrentBag<string>();
long n2 = n * n;
double maxX = 0.5 * (2.0 * n - Math.Sqrt(2) * Math.Sqrt(n2 + 1L));
// Build list to parallelize
int nbX = Convert.ToInt32(maxX);
List<int> xList = new List<int>();
for (int x = 2; x < maxX; x += 2)
xList.Add(x);
Parallel.ForEach(xList, x =>
{
int maxY = (int)Math.Floor((n2 - 2.0 * n * x - 1.0) / (2.0 * n - 2.0 * x));
for (long y = x; y <= maxY; y += 2)
{
long z2 = x * x + y * y + 1L;
long z = (long)Math.Sqrt(z2);
if (z * z == z2 && x + y + z <= n)
res.Add(x + "," + y + "," + z);
}
});
When ran individually on a i5-8400 CPU, I get these results:
N: 10; Solutions: 1;
Time elapsed: 0.03 ms (Not parallel, int)
N: 100; Solutions: 6;
Time elapsed: 0.05 ms (Not parallel, int)
N: 1000; Solutions: 55;
Time elapsed: 0.3 ms (Not parallel, int)
N: 10000; Solutions: 543;
Time elapsed: 13.1 ms (Not parallel, int)
N: 100000; Solutions: 5512;
Time elapsed: 849.4 ms (Parallel, long)
You must use long when N is greater than 36340, because when it's squared, it overflows an int's max value. Finally, the parallel version starts to get better than the simple one when N is around 23000, with ints.
No time to properly test it, but seemed to yield the same results as your code (at 100 -> 6 results and at 1000 -> 55 results).
With N=1000 a time of 2ms vs your 144ms also without List
and N=10000 a time of 28ms
var N = 1000;
var c = 0;
for (int x = 2; x < N; x+=2)
{
for (int y = x; y < (N - x); y+=2)
{
long z2 = x * x + y * y + 1;
int z = (int) Math.Sqrt(z2);
if (x + y + z > N)
break;
if (z * z == z2)
c++;
}
}
Console.WriteLine(c);
#include<iostream>
#include<math.h>
int main()
{
int N = 10000;
int c = 0;
for (int x = 2; x < N; x+=2)
{
for (int y = x; y < (N - x); y+=2)
{
auto z = sqrt(x * x + y * y + 1);
if(x+y+z>N){
break;
}
if (z - (int) z == 0)
{
c++;
}
}
}
std::cout<<c;
}
This is my solution. On testing the previous solutions for this problem I found that x,y are always even and z is odd. I dont know the mathematical nature behind this, I am currently trying to figure that out.
I want to get it done in C# and it should be covering all the test
cases based on condition provided in the question.
The basic code, converted to long to process the N <= 100000 upper limit, with every optimizaion thrown in I could. I used alternate forms from #Mat's (+1) Wolfram Alpha query to precompute as much as possible. I also did a minimal perfect square test to avoid millions of sqrt() calls at the upper limit:
public static void Main()
{
int c = 0;
long N = long.Parse(Console.ReadLine());
long N_squared = N * N;
double half_N_squared = N_squared / 2.0 - 0.5;
double x_limit = N - Math.Sqrt(2) / 2.0 * Math.Sqrt(N_squared + 1);
for (long x = 2; x < x_limit; x += 2)
{
long x_squared = x * x + 1;
double y_limit = (half_N_squared - N * x) / (N - x);
for (long y = x; y < y_limit; y += 2)
{
long z_squared = x_squared + y * y;
int digit = (int) z_squared % 10;
if (digit == 3 || digit == 7)
{
continue; // minimalist non-perfect square elimination
}
long z = (long) Math.Sqrt(z_squared);
if (z * z == z_squared)
{
c++;
}
}
}
Console.WriteLine(c);
}
I followed the trend and left out "the degenerate solution" as implied by the OP's code but not explicitly stated.

For Loop in Foreach Loop Performance Improvement

I have a db table with 2M entries
My XPositions table structure is
Id - int
FID - int
CoordinateQue - int
Latitude - float
Longitude - float
Each row represents a marker position and I need to calculate distance between each coordinates and save to another table.
My xWeights table structure is;
Id - int
x_Id - int
Tox - int
Distance - decimal(18,8)
So far my working code is
var query = _xRepository.TableNoTracking;
var xNodes = query.ToList()
var n = new xWeights();
foreach (var x in xNodes)
{
for (var i = 0; i < xNodes.Count; i++)
{
if(x.Id == xNodes[i].Id)
{
//Do nothing - Same Node
}
else
{
var R = 6378137;
var φ1 = (Math.PI / 180) * x.Latitude;
var φ2 = (Math.PI / 180) * xNodes[i].Latitude;
var Δφ = (xNodes[i].Latitude - x.Latitude) * (Math.PI / 180);
var Δλ = (xNodes[i].Longitude - x.Longitude) * (Math.PI / 180);
var Δψ = Math.Log(Math.Tan(Math.PI / 4 + φ2 / 2) / Math.Tan(Math.PI / 4 + φ1 / 2));
var q = Math.Abs(Δψ) > 10e-12 ? Δφ / Δψ : Math.Cos(φ1); // E-W course creates problem with 0/0
// if Longitude over 180° take shorter rhumb line across the anti-meridian:
if (Math.Abs(Δλ) > Math.PI) Δλ = Δλ > 0 ? -(2 * Math.PI - Δλ) : (2 * Math.PI + Δλ);
var dist = (Math.Sqrt(Δφ * Δφ + q * q * Δλ * Δλ)) * R;
n.x_Id = x.Id;
n.Tox = xNodes[i].Id;
n.Distance = dist;
_xWeightsRepository.Insert(n);
}
}
}
My problem is; I am getting approximately 35k records per minute so will be 2.1M record per hour. This will take forever to finish this. Any ideas how to improve the performance?
The problem is not with this function, but with what you are trying to achieve.
You are trying to insert every from-to combination into _xWeightsRepository. If there are 2 million nodes, then that means 4 thousand billion weights.
If you could insert a weight per CPU clock cycle (which is several orders of magnitude faster than you could ever actually hope to achieve) then you'll still be waiting ten or twenty years.
Check out SQL spatial indexes. I'm going to take a guess that your answer lies in that direction:
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-spatial-index-transact-sql

Linear regression gradient descent using C#

I'm taking the Coursera machine learning course right now and I cant get my gradient descent linear regression function to minimize. I use: one dependent variable, an intercept, and four values of x and y, therefore the equations are fairly simple. The final value of the Gradient Decent equation varies wildly depending on the initial values of alpha and beta and I cant figure out why.
I've only been coding for about two weeks, so my knowledge is limited to say the least, please keep this in mind if you take the time to help.
using System;
namespace LinearRegression
{
class Program
{
static void Main(string[] args)
{
Random rnd = new Random();
const int N = 4;
//We randomize the inital values of alpha and beta
double theta1 = rnd.Next(0, 100);
double theta2 = rnd.Next(0, 100);
//Values of x, i.e the independent variable
double[] x = new double[N] { 1, 2, 3, 4 };
//VAlues of y, i.e the dependent variable
double[] y = new double[N] { 5, 7, 9, 12 };
double sumOfSquares1;
double sumOfSquares2;
double temp1;
double temp2;
double sum;
double learningRate = 0.001;
int count = 0;
do
{
//We reset the Generalized cost function, called sum of squares
//since I originally used SS to
//determine if the function was minimized
sumOfSquares1 = 0;
sumOfSquares2 = 0;
//Adding 1 to counter for each iteration to keep track of how
//many iterations are completed thus far
count += 1;
//First we calculate the Generalized cost function, which is
//to be minimized
sum = 0;
for (int i = 0; i < (N - 1); i++)
{
sum += Math.Pow((theta1 + theta2 * x[i] - y[i]), 2);
}
//Since we have 4 values of x and y we have 1/(2*N) = 1 /8 = 0.125
sumOfSquares1 = 0.125 * sum;
//Then we calcualte the new alpha value, using the derivative of
//the cost function.
sum = 0;
for (int i = 0; i < (N - 1); i++)
{
sum += theta1 + theta2 * x[i] - y[i];
}
//Since we have 4 values of x and y we have 1/(N) = 1 /4 = 0.25
temp1 = theta1 - learningRate * 0.25 * sum;
//Same for the beta value, it has a different derivative
sum = 0;
for (int i = 0; i < (N - 1); i++)
{
sum += (theta1 + theta2 * x[i]) * x[i] - y[i];
}
temp2 = theta2 - learningRate * 0.25 * sum;
//WE change the values of alpha an beta at the same time, otherwise the
//function wont work
theta1 = temp1;
theta2 = temp2;
//We then calculate the cost function again, with new alpha and beta values
sum = 0;
for (int i = 0; i < (N - 1); i++)
{
sum += Math.Pow((theta1 + theta2 * x[i] - y[i]), 2);
}
sumOfSquares2 = 0.125 * sum;
Console.WriteLine("Alpha: {0:N}", theta1);
Console.WriteLine("Beta: {0:N}", theta2);
Console.WriteLine("GCF Before: {0:N}", sumOfSquares1);
Console.WriteLine("GCF After: {0:N}", sumOfSquares2);
Console.WriteLine("Iterations: {0}", count);
Console.WriteLine(" ");
} while (sumOfSquares2 <= sumOfSquares1 && count < 5000);
//we end the iteration cycle once the generalized cost function
//cannot be reduced any further or after 5000 iterations
Console.ReadLine();
}
}
}
There are two bugs in the code.
First, I assume that you would like to iterate through all the element in the array. So rework the for loop like this: for (int i = 0; i < N; i++)
Second, when updating the theta2 value the summation is not calculated well. According to the update function it should be look like this: sum += (theta1 + theta2 * x[i] - y[i]) * x[i];
Why the final values depend on the initial values?
Because the gradient descent update step is calculated from these values. If the initial values (Starting Point) are too big or too small, then it will be too far away from the final values (Final Value). You could solve this problem by:
Increasing the iteration steps (e.g. 5000 to 50000): gradient descent algorithm has more time to converge.
Decreasing the learning rate (e.g. 0.001 to 0.01): gradient descent update steps are bigger, therefore it converges faster. Note: if the learning rate is too small, then it is possible to step through the global minimum.
The slope (theta2) is around 2.5 and the intercept (theta1) is around 2.3 for the given data. I have created a github project to fix your code and i have also added a shorter solution using LINQ. It is 5 line of codes. If you are curious check it out here.

How to calculate the points between two given points and given distance?

I have point A (35.163 , 128.001) and point B (36.573 , 128.707)
I need to calculate the points lies within point A and point B
using the standard distance formula between 2 points, I found D = 266.3
each of the points lies within the line AB (the black point p1, p2, ... p8) are separated with equal distance of d = D / 8 = 33.3
How could I calculate the X and Y for p1 , p2, ... p8?
example of Java or C# language are welcomed
or just point me a formula or method will do.
Thank you.
**The above calculation is actually used to calculate the dummy point for shaded level in my map and working for shaded area interpolation purpose*
that's easy but you need some math knowledge.
PointF pointA, pointB;
var diff_X = pointB.X - pointA.X;
var diff_Y = pointB.Y - pointA.Y;
int pointNum = 8;
var interval_X = diff_X / (pointNum + 1);
var interval_Y = diff_Y / (pointNum + 1);
List<PointF> pointList = new List<PointF>();
for (int i = 1; i <= pointNum; i++)
{
pointList.Add(new PointF(pointA.X + interval_X * i, pointA.Y + interval_Y*i));
}
Straitforward trigonometric solution could be something like that:
// I've used Tupple<Double, Double> to represent a point;
// You, probably have your own type for it
public static IList<Tuple<Double, Double>> SplitLine(
Tuple<Double, Double> a,
Tuple<Double, Double> b,
int count) {
count = count + 1;
Double d = Math.Sqrt((a.Item1 - b.Item1) * (a.Item1 - b.Item1) + (a.Item2 - b.Item2) * (a.Item2 - b.Item2)) / count;
Double fi = Math.Atan2(b.Item2 - a.Item2, b.Item1 - a.Item1);
List<Tuple<Double, Double>> points = new List<Tuple<Double, Double>>(count + 1);
for (int i = 0; i <= count; ++i)
points.Add(new Tuple<Double, Double>(a.Item1 + i * d * Math.Cos(fi), a.Item2 + i * d * Math.Sin(fi)));
return points;
}
...
IList<Tuple<Double, Double>> points = SplitLine(
new Tuple<Double, Double>(35.163, 128.001),
new Tuple<Double, Double>(36.573, 128.707),
8);
Outcome (points):
(35,163, 128,001) // <- Initial point A
(35,3196666666667, 128,079444444444)
(35,4763333333333, 128,157888888889)
(35,633, 128,236333333333)
(35,7896666666667, 128,314777777778)
(35,9463333333333, 128,393222222222)
(36,103, 128,471666666667)
(36,2596666666667, 128,550111111111)
(36,4163333333333, 128,628555555556)
(36,573, 128,707) // <- Final point B
Subtract A from B, component-wise, to get the vector from A to B. Multiply that vector by the desired step value and add it to A. (Note that with eight intermediate steps as you've illustrated, the step distance is 1.0 / 9.0.) Something like this, assuming you really want seven points:
vec2 A = vec2 (35.163, 128.001);
vec2 B = vec2 (36.573, 128.707);
vec2 V = B - A;
for (i = 1; i < 8; i++) {
vec2 p[i] = A + V * (float)i / 8.0;
}
(Sorry, don't know any Java or C#.)
let A be point (xa, ya), and B be point (xb, yb)
alpha = tan-1((yb - ya)/(xb - xa))
p1 = (xa + d * cos(alpha), ya + d * sin(alpha))
pk = (xa + kd * cos(alpha), ya + kd * sin(alpha)), k = 1 to 7
(An equivalent way would be to use vector arithmetic)
At first find the slope of AB line. Get help and formula from here: http://www.purplemath.com/modules/slope.htm
Then consider a triangle of Ap1E(think there is a point E which is right to A and below to p1).
You already know the angle AEp1 is 90degree. and you have calculated angle p1AE(from the slope of AB).
Now find AE and Ep1.
Xp1=Xa+AE and Yp1=Ya+Ep1
This will not be very difficult in C# or java.
Once you understand the logic, you will find pleasure implementing on your own way.

Curve fitting points in 3D space

Trying to find functions that will assist us to draw a 3D line through a series of points.
For each point we know: Date&Time, Latitude, Longitude, Altitude, Speed and Heading.
Data might be recorded every 10 seconds and we would like to be able to guestimate the points in between and increase granularity to 1 second. Thus creating a virtual flight path in 3D space.
I have found a number of curve fitting algorithms that will approximate a line through a series of points but they do not guarantee that the points are intersected. They also do not take into account speed and heading to determine the most likely path taken by the object to reach the next point.
From a physics viewpoint:
You have to assume something about the acceleration in your intermediate points to get the interpolation.
If your physical system is relatively well-behaved (as a car or a plane), as opposed to for example a bouncing ball, you may go ahead supposing an acceleration varying linearly with time between your points.
The vector equation for a constant varying accelerated movement is:
x''[t] = a t + b
where all magnitudes except t are vectors.
For each segment you already know v(t=t0) x(t=t0) tfinal and x(tfinal) v(tfinal)
By solving the differential equation you get:
Eq 1:
x[t_] := (3 b t^2 Tf + a t^3 Tf - 3 b t Tf^2 - a t Tf^3 - 6 t X0 + 6 Tf X0 + 6 t Xf)/(6 Tf)
And imposing the initial and final contraints for position and velocity you get:
Eqs 2:
a -> (6 (Tf^2 V0 - 2 T0 Tf Vf + Tf^2 Vf - 2 T0 X0 + 2 Tf X0 +
2 T0 Xf - 2 Tf Xf))/(Tf^2 (3 T0^2 - 4 T0 Tf + Tf^2))
b -> (2 (-2 Tf^3 V0 + 3 T0^2 Tf Vf - Tf^3 Vf + 3 T0^2 X0 -
3 Tf^2 X0 - 3 T0^2 Xf + 3 Tf^2 Xf))/(Tf^2 (3 T0^2 - 4 T0 Tf + Tf^2))}}
So inserting the values for eqs 2 into eq 1 you get the temporal interpolation for your points, based on the initial and final position and velocities.
HTH!
Edit
A few examples with abrupt velocity change in two dimensions (in 3D is exactly the same). If the initial and final speeds are similar, you'll get "straighter" paths.
Suppose:
X0 = {0, 0}; Xf = {1, 1};
T0 = 0; Tf = 1;
If
V0 = {0, 1}; Vf = {-1, 3};
V0 = {0, 1}; Vf = {-1, 5};
V0 = {0, 1}; Vf = {1, 3};
Here is an animation where you may see the speed changing from V0 = {0, 1} to Vf = {1, 5}:
Here you may see an accelerating body in 3D with positions taken at equal intervals:
Edit
A full problem:
For convenience, I'll work in Cartesian coordinates. If you want to convert from lat/log/alt to Cartesian just do:
x = rho sin(theta) cos(phi)
y = rho sin(theta) sin(phi)
z = rho cos(theta)
Where phi is the longitude, theta is the latitude, and rho is your altitude plus the radius of the Earth.
So suppose we start our segment at:
t=0 with coordinates (0,0,0) and velocity (1,0,0)
and end at
t=10 with coordinates (10,10,10) and velocity (0,0,1)
I clearly made a change in the origin of coordinates to set the origin at my start point. That is just for getting nice round numbers ...
So we replace those numbers in the formulas for a and b and get:
a = {-(3/50), -(3/25), -(3/50)} b = {1/5, 3/5, 2/5}
With those we go to eq 1, and the position of the object is given by:
p[t] = {1/60 (60 t + 6 t^2 - (3 t^3)/5),
1/60 (18 t^2 - (6 t^3)/5),
1/60 (12 t^2 - (3 t^3)/5)}
And that is it. You get the position from 1 to 10 secs replacing t by its valus in the equation above.
The animation runs:
Edit 2
If you don't want to mess with the vertical acceleration (perhaps because your "speedometer" doesn't read it), you could just assign a constant speed to the z axis (there is a very minor error for considering it parallel to the Rho axis), equal to (Zfinal - Zinit)/(Tf-T0), and then solve the problem in the plane forgetting the altitude.
What you're asking is a general interpolation problem. My guess is your actual problem isn't due to the curve-fitting algorithm being used, but rather your application of it to all discrete values recorded by the system instead of the relevant set of values.
Let's decompose your problem. You're currently drawing a point in spherically-mapped 3D space, adjusting for linear and curved paths. If we discretize the operations performed by an object with six degrees of freedom (roll, pitch, and yaw), the only operations you're particularly interested in are linear paths and curved paths accounting for pitch and yaw in any direction. Accounting for acceleration and deceleration also possible given understanding of basic physics.
Dealing with the spherical mapping is easy. Simply unwrap your points relative to their position on a plane, adjusting for latitude, longitude, and altitude. This should allow you to flatten data that would otherwise exist along a curved path, though this may not strictly be necessary for the solutions to your problem (see below).
Linear interpolation is easy. Given an arbitrary number of points backwards in time that fit a line within n error as determined by your system,* construct the line and compute the distance in time between each point. From here, attempt to fit the time points to one of two cases: constant velocity or constant acceleration.
Curve interpolation is a little more difficult, but still plausible. For cases of pitch, yaw, or combined pitch+yaw, construct a plane containing an arbitrary number of points backwards in time, within m error for curved readouts from your system.* From these data, construct a planar curve and once again account for constant velocity or acceleration along the curve.
You can do better than this by attempting to predict the expected operations of a plane in flight as part of a decision tree or neural network relative to the flight path. I'll leave that as an exercise for the reader.
Best of luck designing your system.
--
* Both error readouts are expected to be from GPS data, given the description of the problem. Accounting and adjusting for errors in these data is a separate interesting problem.
What you need (instead of modeling the physics) is to fit a spline through the data. I used a numerical recipies book (http://www.nrbook.com/a has free C and FORTRAN algorithms. Look into F77 section 3.3 to get the math needed). If you want to be simple then just fit lines through the points, but that will not result in a smooth flight path at all. Time will be your x value, and each parameter loged will have it's own cublic spline parameters.
Since we like long postings for this question here is the full code:
//driver program
static void Main(string[] args)
{
double[][] flight_data = new double[][] {
new double[] { 0, 10, 20, 30 }, // time in seconds
new double[] { 14500, 14750, 15000, 15125 }, //altitude in ft
new double[] { 440, 425, 415, 410 }, // speed in knots
};
CubicSpline altitude = new CubicSpline(flight_data[0], flight_data[1]);
CubicSpline speed = new CubicSpline(flight_data[0], flight_data[2]);
double t = 22; //Find values at t
double h = altitude.GetY(t);
double v = speed.GetY(t);
double ascent = altitude.GetYp(t); // ascent rate in ft/s
}
// Cubic spline definition
using System.Linq;
/// <summary>
/// Cubic spline interpolation for tabular data
/// </summary>
/// <remarks>
/// Adapted from numerical recipies for FORTRAN 77
/// (ISBN 0-521-43064-X), page 110, section 3.3.
/// Function spline(x,y,yp1,ypn,y2) converted to
/// C# by jalexiou, 27 November 2007.
/// Spline integration added also Nov 2007.
/// </remarks>
public class CubicSpline
{
double[] xi;
double[] yi;
double[] yp;
double[] ypp;
double[] yppp;
double[] iy;
#region Constructors
public CubicSpline(double x_min, double x_max, double[] y)
: this(Sequence(x_min, x_max, y.Length), y)
{ }
public CubicSpline(double x_min, double x_max, double[] y, double yp1, double ypn)
: this(Sequence(x_min, x_max, y.Length), y, yp1, ypn)
{ }
public CubicSpline(double[] x, double[] y)
: this(x, y, double.NaN, double.NaN)
{ }
public CubicSpline(double[] x, double[] y, double yp1, double ypn)
{
if( x.Length == y.Length )
{
int N = x.Length;
xi = new double[N];
yi = new double[N];
x.CopyTo(xi, 0);
y.CopyTo(yi, 0);
if( N > 0 )
{
double p, qn, sig, un;
ypp = new double[N];
double[] u = new double[N];
if( double.IsNaN(yp1) )
{
ypp[0] = 0;
u[0] = 0;
}
else
{
ypp[0] = -0.5;
u[0] = (3 / (xi[1] - xi[0])) *
((yi[1] - yi[0]) / (x[1] - x[0]) - yp1);
}
for (int i = 1; i < N-1; i++)
{
double hp = x[i] - x[i - 1];
double hn = x[i + 1] - x[i];
sig = hp / hn;
p = sig * ypp[i - 1] + 2.0;
ypp[i] = (sig - 1.0) / p;
u[i] = (6 * ((y[i + 1] - y[i]) / hn) - (y[i] - y[i - 1]) / hp)
/ (hp + hn) - sig * u[i - 1] / p;
}
if( double.IsNaN(ypn) )
{
qn = 0;
un = 0;
}
else
{
qn = 0.5;
un = (3 / (x[N - 1] - x[N - 2])) *
(ypn - (y[N - 1] - y[N - 2]) / (x[N - 1] - x[N - 2]));
}
ypp[N - 1] = (un - qn * u[N - 2]) / (qn * ypp[N - 2] + 1.0);
for (int k = N-2; k > 0; k--)
{
ypp[k] = ypp[k] * ypp[k + 1] + u[k];
}
// Calculate 1st derivatives
yp = new double[N];
double h;
for( int i = 0; i < N - 1; i++ )
{
h = xi[i + 1] - xi[i];
yp[i] = (yi[i + 1] - yi[i]) / h
- h / 6 * (ypp[i + 1] + 2 * ypp[i]);
}
h = xi[N - 1] - xi[N - 2];
yp[N - 1] = (yi[N - 1] - yi[N - 2]) / h
+ h / 6 * (2 * ypp[N - 1] + ypp[N - 2]);
// Calculate 3rd derivatives as average of dYpp/dx
yppp = new double[N];
double[] jerk_ij = new double[N - 1];
for( int i = 0; i < N - 1; i++ )
{
h = xi[i + 1] - xi[i];
jerk_ij[i] = (ypp[i + 1] - ypp[i]) / h;
}
Yppp = new double[N];
yppp[0] = jerk_ij[0];
for( int i = 1; i < N - 1; i++ )
{
yppp[i] = 0.5 * (jerk_ij[i - 1] + jerk_ij[i]);
}
yppp[N - 1] = jerk_ij[N - 2];
// Calculate Integral over areas
iy = new double[N];
yi[0] = 0; //Integration constant
for( int i = 0; i < N - 1; i++ )
{
h = xi[i + 1] - xi[i];
iy[i + 1] = h * (yi[i + 1] + yi[i]) / 2
- h * h * h / 24 * (ypp[i + 1] + ypp[i]);
}
}
else
{
yp = new double[0];
ypp = new double[0];
yppp = new double[0];
iy = new double[0];
}
}
else
throw new IndexOutOfRangeException();
}
#endregion
#region Actions/Functions
public int IndexOf(double x)
{
//Use bisection to find index
int i1 = -1;
int i2 = Xi.Length;
int im;
double x1 = Xi[0];
double xn = Xi[Xi.Length - 1];
bool ascending = (xn >= x1);
while( i2 - i1 > 1 )
{
im = (i1 + i2) / 2;
double xm = Xi[im];
if( ascending & (x >= Xi[im]) ) { i1 = im; } else { i2 = im; }
}
if( (ascending && (x <= x1)) || (!ascending & (x >= x1)) )
{
return 0;
}
else if( (ascending && (x >= xn)) || (!ascending && (x <= xn)) )
{
return Xi.Length - 1;
}
else
{
return i1;
}
}
public double GetIntY(double x)
{
int i = IndexOf(x);
double x1 = xi[i];
double x2 = xi[i + 1];
double y1 = yi[i];
double y2 = yi[i + 1];
double y1pp = ypp[i];
double y2pp = ypp[i + 1];
double h = x2 - x1;
double h2 = h * h;
double a = (x-x1)/h;
double a2 = a*a;
return h / 6 * (3 * a * (2 - a) * y1
+ 3 * a2 * y2 - a2 * h2 * (a2 - 4 * a + 4) / 4 * y1pp
+ a2 * h2 * (a2 - 2) / 4 * y2pp);
}
public double GetY(double x)
{
int i = IndexOf(x);
double x1 = xi[i];
double x2 = xi[i + 1];
double y1 = yi[i];
double y2 = yi[i + 1];
double y1pp = ypp[i];
double y2pp = ypp[i + 1];
double h = x2 - x1;
double h2 = h * h;
double A = 1 - (x - x1) / (x2 - x1);
double B = 1 - A;
return A * y1 + B * y2 + h2 / 6 * (A * (A * A - 1) * y1pp
+ B * (B * B - 1) * y2pp);
}
public double GetYp(double x)
{
int i = IndexOf(x);
double x1 = xi[i];
double x2 = xi[i + 1];
double y1 = yi[i];
double y2 = yi[i + 1];
double y1pp = ypp[i];
double y2pp = ypp[i + 1];
double h = x2 - x1;
double A = 1 - (x - x1) / (x2 - x1);
double B = 1 - A;
return (y2 - y1) / h + h / 6 * (y2pp * (3 * B * B - 1)
- y1pp * (3 * A * A - 1));
}
public double GetYpp(double x)
{
int i = IndexOf(x);
double x1 = xi[i];
double x2 = xi[i + 1];
double y1pp = ypp[i];
double y2pp = ypp[i + 1];
double h = x2 - x1;
double A = 1 - (x - x1) / (x2 - x1);
double B = 1 - A;
return A * y1pp + B * y2pp;
}
public double GetYppp(double x)
{
int i = IndexOf(x);
double x1 = xi[i];
double x2 = xi[i + 1];
double y1pp = ypp[i];
double y2pp = ypp[i + 1];
double h = x2 - x1;
return (y2pp - y1pp) / h;
}
public double Integrate(double from_x, double to_x)
{
if( to_x < from_x ) { return -Integrate(to_x, from_x); }
int i = IndexOf(from_x);
int j = IndexOf(to_x);
double x1 = xi[i];
double xn = xi[j];
double z = GetIntY(to_x) - GetIntY(from_x); // go to nearest nodes (k) (j)
for( int k = i + 1; k <= j; k++ )
{
z += iy[k]; // fill-in areas in-between
}
return z;
}
#endregion
#region Properties
public bool IsEmpty { get { return xi.Length == 0; } }
public double[] Xi { get { return xi; } set { xi = value; } }
public double[] Yi { get { return yi; } set { yi = value; } }
public double[] Yp { get { return yp; } set { yp = value; } }
public double[] Ypp { get { return ypp; } set { ypp = value; } }
public double[] Yppp { get { return yppp; } set { yppp = value; } }
public double[] IntY { get { return yp; } set { iy = value; } }
public int Count { get { return xi.Length; } }
public double X_min { get { return xi.Min(); } }
public double X_max { get { return xi.Max(); } }
public double Y_min { get { return yi.Min(); } }
public double Y_max { get { return yi.Max(); } }
#endregion
#region Helpers
static double[] Sequence(double x_min, double x_max, int double_of_points)
{
double[] res = new double[double_of_points];
for (int i = 0; i < double_of_points; i++)
{
res[i] = x_min + (double)i / (double)(double_of_points - 1) * (x_max - x_min);
}
return res;
}
#endregion
}
You can find an approximation of a line that intersects points in 3d and 2d space using a Hough Transformation algorithm. I am only familiar with it's uses in 2d however but it will still work for 3d spaces given that you know what kind of line you are looking for. There is a basic implementation description linked. You can Google for pre-mades and here is a link to a 2d C implementation CImg.
The algorithm process (roughly)... First you find equation of a line that you think will best approximate the shape of the line (in 2d parabolic, logarithmic, exponential, etc). You take that formula and solve for one of the parameters.
y = ax + b
becomes
b = y - ax
Next, for each point you are attempting to match, you plugin the points to the y and x values. With 3 points, you would have 3 separate functions of b with respect to a.
(2, 3) : b = 3 - 2a
(4, 1) : b = 1 - 4a
(10, -5): b = -5 - 10a
Next, the theory is that you find all possible lines which pass through each of the points, which is infinitely many for each individual point however when combined in an accumulator space only a few possible parameters best fit. In practice this is done by choosing a range space for the parameters (I chose -2 <= a <= 1, 1 <= b <= 6) and begin plugging in values for the variant parameter(s) and solving for the other. You tally up the number of intersections from each function in an accumulator. The points with the highest values give you your parameters.
Accumulator after processing b = 3 - 2a
a: -2 -1 0 1
b: 1
2
3 1
4
5 1
6
Accumulator after processing b = 1 - 4a
a: -2 -1 0 1
b: 1 1
2
3 1
4
4
5 2
6
Accumulator after processing b = -5 - 10a
a: -2 -1 0 1
b: 1 1
2
3 1
4
5 3
6
The parameter set with the highest accumulated value is (b a) = (5 -1) and the function best fit to the points given is y = 5 - x.
Best of luck.
My guess is that a serious application of this would use a http://en.wikipedia.org/wiki/Kalman_filter. By the way, that probably wouldn't guarantee that the reported points were intersected either, unless you fiddled with the parameters a bit. It would expect some degree of error in each data point given to it, so where it thinks the object is at time T would not necessarily be where it was at time T. Of course, you could set the error distribution to say that you were absolutely sure you knew where it was at time T.
Short of using a Kalman filter, I would try and turn it into an optimisation problem. Work at the 1s granularity and write down equations like
x_t' = x_t + (Vx_t + Vx_t')/2 + e,
Vx_t_reported = Vx_t + f,
Vx_t' = Vx_t + g
where e, f, and g represent the noise. Then create a penalty function such as e^2 + f^2 + g^2 +...
or some weighted version such as 1.5e^2 + 3f^2 + 2.6g^2 +... according to your idea of what the errors really are and how smooth you wnat the answer to be, and find the values that make the penalty function as small as possible - with least squares if the equations turn out nicely.

Categories

Resources