C# and SIMD: High and low speedups. What is happening? - c#

Introduction of the problem
I am trying to speed up the intersection code of a (2d) ray tracer that I am writing. I am using C# and the System.Numerics library to bring the speed of SIMD instructions.
The problem is that I am getting strange results, with over-the-roof speedups and rather low speedups. My question is, why is one over-the-roof whereas the other is rather low?
Context:
The RayPack struct is a series of (different) rays, packed in Vectors of System.Numerics.
The BoundingBoxPack and CirclePack struct is a single bb / circle, packed in vectors of System.Numerics.
The CPU used is an i7-4710HQ (Haswell), with SSE 4.2, AVX(2), and FMA(3) instructions.
Running in release mode (64 bit). The project runs in .Net Framework 472. No additional options set.
Attempts
I've already tried looking up whether some operations may or may not be properly supported (Take note: these are for c++. https://fgiesen.wordpress.com/2016/04/03/sse-mind-the-gap/ or http://sci.tuomastonteri.fi/programming/sse), but it seems that is not the case because the laptop I work on supports SSE 4.2.
In the current code, the following changes are applied:
Using more proper instructions (packed min, for example).
Not using the float * vector instruction (causes a lot of additional operations, see the assembly of the original).
Code ... snippets?
Apologies for the large amount of code, but I am not sure how we can discuss this concretely without this amount of code.
Code of Ray -> BoundingBox
public bool CollidesWith(Ray ray, out float t)
{
// https://gamedev.stackexchange.com/questions/18436/most-efficient-aabb-vs-ray-collision-algorithms
// r.dir is unit direction vector of ray
float dirfracx = 1.0f / ray.direction.X;
float dirfracy = 1.0f / ray.direction.Y;
// lb is the corner of AABB with minimal coordinates - left bottom, rt is maximal corner
// r.org is origin of ray
float t1 = (this.rx.min - ray.origin.X) * dirfracx;
float t2 = (this.rx.max - ray.origin.X) * dirfracx;
float t3 = (this.ry.min - ray.origin.Y) * dirfracy;
float t4 = (this.ry.max - ray.origin.Y) * dirfracy;
float tmin = Math.Max(Math.Min(t1, t2), Math.Min(t3, t4));
float tmax = Math.Min(Math.Max(t1, t2), Math.Max(t3, t4));
// if tmax < 0, ray (line) is intersecting AABB, but the whole AABB is behind us
if (tmax < 0)
{
t = tmax;
return false;
}
// if tmin > tmax, ray doesn't intersect AABB
if (tmin > tmax)
{
t = tmax;
return false;
}
t = tmin;
return true;
}
Code of RayPack -> BoundingBoxPack
public Vector<int> CollidesWith(ref RayPack ray, out Vector<float> t)
{
// ------------------------------------------------------- \\
// compute the collision. \\
Vector<float> dirfracx = Constants.ones / ray.direction.x;
Vector<float> dirfracy = Constants.ones / ray.direction.y;
Vector<float> t1 = (this.rx.min - ray.origin.x) * dirfracx;
Vector<float> t2 = (this.rx.max - ray.origin.x) * dirfracx;
Vector<float> t3 = (this.ry.min - ray.origin.y) * dirfracy;
Vector<float> t4 = (this.ry.max - ray.origin.y) * dirfracy;
Vector<float> tmin = Vector.Max(Vector.Min(t1, t2), Vector.Min(t3, t4));
Vector<float> tmax = Vector.Min(Vector.Max(t1, t2), Vector.Max(t3, t4));
Vector<int> lessThanZeroMask = Vector.GreaterThan(tmax, Constants.zeros);
Vector<int> greaterMask = Vector.LessThan(tmin, tmax);
Vector<int> combinedMask = Vector.BitwiseOr(lessThanZeroMask, greaterMask);
// ------------------------------------------------------- \\
// Keep track of the t's that collided. \\
t = Vector.ConditionalSelect(combinedMask, tmin, Constants.floatMax);
return combinedMask;
}
Code of Ray -> Circle
public bool Intersect(Circle other)
{
// Step 0: Work everything out on paper!
// Step 1: Gather all the relevant data.
float ox = this.origin.X;
float dx = this.direction.X;
float oy = this.origin.Y;
float dy = this.direction.Y;
float x0 = other.origin.X;
float y0 = other.origin.Y;
float cr = other.radius;
// Step 2: compute the substitutions.
float p = ox - x0;
float q = oy - y0;
float r = 2 * p * dx;
float s = 2 * q * dy;
// Step 3: compute the substitutions, check if there is a collision.
float a = dx * dx + dy * dy;
float b = r + s;
float c = p * p + q * q - cr * cr;
float DSqrt = b * b - 4 * a * c;
// no collision possible! Commented out to make the benchmark more fair
//if (DSqrt < 0)
//{ return false; }
// Step 4: compute the substitutions.
float D = (float)Math.Sqrt(DSqrt);
float t0 = (-b + D) / (2 * a);
float t1 = (-b - D) / (2 * a);
float ti = Math.Min(t0, t1);
if(ti > 0 && ti < t)
{
t = ti;
return true;
}
return false;
}
Code of RayPack -> CirclePack
Original, unedited, code can be found at: https://pastebin.com/87nYgZrv
public Vector<int> Intersect(CirclePack other)
{
// ------------------------------------------------------- \\
// Put all the data on the stack. \\
Vector<float> zeros = Constants.zeros;
Vector<float> twos = Constants.twos;
Vector<float> fours = Constants.fours;
// ------------------------------------------------------- \\
// Compute whether the ray collides with the circle. This \\
// is stored in the 'mask' vector. \\
Vector<float> p = this.origin.x - other.origin.x; ;
Vector<float> q = this.origin.y - other.origin.y;
Vector<float> r = twos * p * this.direction.x;
Vector<float> s = twos * q * this.direction.y; ;
Vector<float> a = this.direction.x * this.direction.x + this.direction.y * this.direction.y;
Vector<float> b = r + s;
Vector<float> c = p * p + q * q - other.radius * other.radius;
Vector<float> DSqrt = b * b - fours * a * c;
Vector<int> maskCollision = Vector.GreaterThan(DSqrt, zeros);
// Commented out to make the benchmark more fair.
//if (Vector.Dot(maskCollision, maskCollision) == 0)
//{ return maskCollision; }
// ------------------------------------------------------- \\
// Update t if and only if there is a collision. Take \\
// note of the conditional where we compute t. \\
Vector<float> D = Vector.SquareRoot(DSqrt);
Vector<float> bMinus = Vector.Negate(b);
Vector<float> twoA = twos * a;
Vector<float> t0 = (bMinus + D) / twoA;
Vector<float> t1 = (bMinus - D) / twoA;
Vector<float> tm = Vector.ConditionalSelect(Vector.LessThan(t1, t0), t1, t0);
Vector<int> maskBiggerThanZero = Vector.GreaterThan(tm, zeros);
Vector<int> maskSmallerThanT = Vector.LessThan(tm, this.t);
Vector<int> mask = Vector.BitwiseAnd(
maskCollision,
Vector.BitwiseAnd(
maskBiggerThanZero,
maskSmallerThanT)
);
this.t = Vector.ConditionalSelect(
mask, // the bit mask that allows us to choose.
tm, // the smallest of the t's.
t); // if the bit mask is false (0), then we get our original t.
return mask;
}
Assembly code
These can be found on pastebin. Take note that there is some boilerplate assembly from the benchmark tool. You need to look at the function calls.
BoundingBox(Pack): https://pastebin.com/RYMQdZMh
Circle(Pack) Tweaked: https://pastebin.com/YZHjc1vY
Circle(Pack) Original: https://pastebin.com/87nYgZrv
Benchmarking
I've been benchmarking the situation with BenchmarkDotNet.
Results for Circle / CirclePack (updated):
Method
Mean
Error
StdDev
Intersection
9.710 ms
0.0540 ms
0.0505 ms
IntersectionPacked
3.296 ms
0.0055 ms
0.0051 ms
Results for BoundingBox / BoundingBoxPacked:
Method
Mean
Error
StdDev
Intersection
24.269 ms
0.2663 ms
0.2491 ms
IntersectionPacked
1.152 ms
0.0229 ms
0.0264 ms
Due to AVX, a speedup of roughly 6x-8x is expected. The speedup of the boundingbox is significant, whereas the speedup of the circle is rather low.
Revisiting the question at the top: Why is one speedup over-the-roof and the other rather low? And how can the lower of the two (CirclePack) become faster?
Edit(s) with regard to Peter Cordes (comments)
Made the benchmark more fair: the single ray version does not early-branch-out as soon as the ray can no longer collide. Now the speedup is roughly 2.5x.
Added the assembly code as a separate header.
With regard to the square root: This does have impact, but not as much as it seems. Removing the vector square root reduces the total time with about 0.3ms. The single ray code now always performs the square root too.
Question about FMA (Fused Multiply Addition) in C#. I think it does for scalars (see Can C# make use of fused multiply-add?), but I haven't found a similar operation within the System.Numerics.Vector struct.
About a C# instruction for packed min: Yes it does. Silly me. I even used it already.

I am not going to try to answer the question about SIMD speedup, but provide some detailed comments on poor coding in the scalar version that carried over to the vector version, in a way that doesn't fit in an SO comment.
This code in Intersect(Circle) is just absurd:
// Step 3: compute the substitutions, check if there is a collision.
float a = dx * dx + dy * dy;
sum of squares -> a is guaranteed non-negative
float b = r + s;
float c = p * p + q * q - cr * cr;
float DSqrt = b * b - 4 * a * c;
This isn't the square root of D, it's the square of D.
// no collision possible! Commented out to make the benchmark more fair
//if (DSqrt < 0)
//{ return false; }
// Step 4: compute the substitutions.
float D = (float)Math.Sqrt(DSqrt);
sqrt has a limited domain. Avoiding the call for negative input doesn't just save the average cost of the square root, it prevents you from having to handle NaNs which are very, very slow.
Also, D is non-negative, since Math.Sqrt returns either the positive branch or NaN.
float t0 = (-b + D) / (2 * a);
float t1 = (-b - D) / (2 * a);
The difference between these two is t0 - t1 = D / a. The ratio of two non-negative variables is also non-negative. Therefore t1 is never larger.
float ti = Math.Min(t0, t1);
This call always selects t1. Computing t0 and testing which is larger is a waste.
if(ti > 0 && ti < t)
{
t = ti;
return true;
}
Recalling that ti is always t1, and a is non-negative, the first test is equivalent to -b - D > 0 or b < -D.
In the SIMD version, Vector.SquareRoot documentation does not describe the behavior when inputs are negative. And Vector.LessThan does not describe the behavior when inputs are NaN.

Related

Finding the true anomaly from state vectors

I'm attempting to convert from state vectors (position and speed) into Kepler elements, however I'm running into problems where a negative velocity or position will give me wrong results when trying to calculate true anomaly.
Here are the different ways I'm trying to calculate the True Anomaly:
/// <summary>
/// https://en.wikipedia.org/wiki/True_anomaly#From_state_vectors
/// </summary>
public static double TrueAnomaly(Vector4 eccentVector, Vector4 position, Vector4 velocity)
{
var dotEccPos = Vector4.Dot(eccentVector, position);
var talen = eccentVector.Length() * position.Length();
talen = dotEccPos / talen;
talen = GMath.Clamp(talen, -1, 1);
var trueAnomoly = Math.Acos(talen);
if (Vector4.Dot(position, velocity) < 0)
trueAnomoly = Math.PI * 2 - trueAnomoly;
return trueAnomoly;
}
//sgp = standard gravitational parameter
public static double TrueAnomaly(double sgp, Vector4 position, Vector4 velocity)
{
var H = Vector4.Cross(position, velocity).Length();
var R = position.Length();
var q = Vector4.Dot(position, velocity); // dot product of r*v
var TAx = H * H / (R * sgp) - 1;
var TAy = H * q / (R * sgp);
var TA = Math.Atan2(TAy, TAx);
return TA;
}
public static double TrueAnomalyFromEccentricAnomaly(double eccentricity, double eccentricAnomaly)
{
var x = Math.Sqrt(1 - Math.Pow(eccentricity, 2)) * Math.Sin(eccentricAnomaly);
var y = Math.Cos(eccentricAnomaly) - eccentricity;
return Math.Atan2(x, y);
}
public static double TrueAnomalyFromEccentricAnomaly2(double eccentricity, double eccentricAnomaly)
{
var x = Math.Cos(eccentricAnomaly) - eccentricity;
var y = 1 - eccentricity * Math.Cos(eccentricAnomaly);
return Math.Acos(x / y);
}
Edit: another way of doing it which Spectre pointed out:
public static double TrueAnomaly(Vector4 position, double loP)
{
return Math.Atan2(position.Y, position.X) - loP;
}
Positions are all relative to the parent body.
These functions all agree if position.x, position.y and velocity.y are all positive.
How do I fix these so that I get a consistent results when position and velocity are negitive?
Just to clarify: My angles appear to be sort of correct, just pointing in the wrong quadrant depending on the position and or velocity vectors.
Yeah so I was wrong, the above all do return the correct values after all.
So I found an edge case where most of the above calculations fail.
Given position and velocity:
pos = new Vector4() { X = -0.208994076275941, Y = 0.955838328099748 };
vel = new Vector4() { X = -2.1678187689294E-07, Y = -7.93096769486992E-08 };
I get some odd results, ie ~ -31.1 degrees, when I think it should return ` 31.1 (non negative). one of them returns ~ 328.8.
However testing with this position and velocity the results apear to be ok:
pos = new Vector4() { X = -0.25, Y = 0.25 };
vel = new Vector4() { X = Distance.KmToAU(-25), Y = Distance.KmToAU(-25) };
See my answer for extra code on how I'm testing and the math I'm using for some of the other variables.
I'm going around in circles on this one. this is a result of a bug in my existing code that shows up under some conditions but not others.
I guess the real question now is WHY am I getting different results with position/velocity above that don't match to my expectations or each other?
Assuming 2D case... I am doing this differently:
compute radius of semi axises and rotation
so you need to remember whole orbit and find 2 most distant points on it that is major axis a. The minor axis b usually is 90 deg from major axis but to be sure just fins 2 perpendicularly most distant points on your orbit to major axis. So now you got both semi axises. The initial rotation is computed from the major axis by atan2.
compute true anomaly E
so if center is x0,y0 (intersection of a,b or center point of both) initial rotation is ang0 (angle of a) and your point on orbit is x,y then:
E = atan2(y-y0,x-x0) - ang0
However in order to match Newton/D'Alembert physics to Kepler orbital parameters you need to boost the integration precision like I did here:
Is it possible to make realistic n-body solar system simulation in matter of size and mass?
see the [Edit3] Improving Newton D'ALembert integration precision even more in there.
For more info and equations see:
Solving Kepler's equation
[Edit1] so you want to compute V I see it like this:
As you got your coordinates relative to parent you can assume they are already in focal point centered so no need for x0,y0 anymore. Of coarse if you want high precision and have more than 2 bodies (focal mass + object + proximity object(s) like moons) then the parent mass will no longer be in focal point of orbit but close to it ... and to remedy you need to use real focal point position so x0,y0 again... So how to do it:
compute center point (cx,cy) and a,b semi axises
so its the same as in previous text.
compute focal point (x0,y0) in orbit axis aligned coordinates
simple:
x0 = cx + sqrt( a^2 + b^2 );
y0 = cy;
initial angle ang0 of a
let xa,ya be the intersection of orbit and major axis a on the side with bigger speeds (near parent object focus). Then:
ang0 = atan2( ya-cy , xa-cx );
and finally the V fore any of yours x,y
V = atan2( y-y0 , x-x0 ) - ang0;
Ok so on further testing it appears my original calcs do all return the correct values, however when I was looking at the outputs I was not taking the LoP into account and basically not recognizing that 180 is essentially the same angle as -180.
(I was also looking at the output in radians and just didn't see what should have been obvious)
Long story short, I have a bug I thought was in this area of the code and got lost in the weeds.
Seems I was wrong above. see OP for edge case.
Here's some code I used to test these,
I used variations of the following inputs:
pos = new Vector4() { X = 0.25, Y = 0.25 };
vel = new Vector4() { X = Distance.KmToAU(-25), Y = Distance.KmToAU(25) };
And tested them with the following
double parentMass = 1.989e30;
double objMass = 2.2e+15;
double sgp = GameConstants.Science.GravitationalConstant * (parentMass + objMass) / 3.347928976e33;
Vector4 ev = OrbitMath.EccentricityVector(sgp, pos, vel);
double e = ev.Length();
double specificOrbitalEnergy = Math.Pow(vel.Length(), 2) * 0.5 - sgp / pos.Length();
double a = -sgp / (2 * specificOrbitalEnergy);
double ae = e * a;
double aop = Math.Atan2(ev.Y, ev.X);
double eccentricAnomaly = OrbitMath.GetEccentricAnomalyFromStateVectors(pos, a, ae, aop);
double aopD = Angle.ToDegrees(aop);
double directAngle = Math.Atan2(pos.Y, pos.X);
var θ1 = OrbitMath.TrueAnomaly(sgp, pos, vel);
var θ2 = OrbitMath.TrueAnomaly(ev, pos, vel);
var θ3 = OrbitMath.TrueAnomalyFromEccentricAnomaly(e, eccentricAnomaly);
var θ4 = OrbitMath.TrueAnomalyFromEccentricAnomaly2(e, eccentricAnomaly);
var θ5 = OrbitMath.TrueAnomaly(pos, aop);
double angleΔ = 0.0000001; //this is the "acceptable" amount of error, really only the TrueAnomalyFromEccentricAnomaly() calcs needed this.
Assert.AreEqual(0, Angle.DifferenceBetweenRadians(directAngle, aop - θ1), angleΔ);
Assert.AreEqual(0, Angle.DifferenceBetweenRadians(directAngle, aop - θ2), angleΔ);
Assert.AreEqual(0, Angle.DifferenceBetweenRadians(directAngle, aop - θ3), angleΔ);
Assert.AreEqual(0, Angle.DifferenceBetweenRadians(directAngle, aop - θ4), angleΔ);
Assert.AreEqual(0, Angle.DifferenceBetweenRadians(directAngle, aop - θ5), angleΔ);
and the following to compare the angles:
public static double DifferenceBetweenRadians(double a1, double a2)
{
return Math.PI - Math.Abs(Math.Abs(a1 - a2) - Math.PI);
}
And eccentricity Vector found thus:
public static Vector4 EccentricityVector(double sgp, Vector4 position, Vector4 velocity)
{
Vector4 angularMomentum = Vector4.Cross(position, velocity);
Vector4 foo1 = Vector4.Cross(velocity, angularMomentum) / sgp;
var foo2 = position / position.Length();
return foo1 - foo2;
}
And EccentricAnomaly:
public static double GetEccentricAnomalyFromStateVectors(Vector4 position, double a, double linierEccentricity, double aop)
{
var x = (position.X * Math.Cos(-aop)) - (position.Y * Math.Sin(-aop));
x = linierEccentricity + x;
double foo = GMath.Clamp(x / a, -1, 1); //because sometimes we were getting a floating point error that resulted in numbers infinatly smaller than -1
return Math.Acos(foo);
}
Thanks to Futurogogist and Spektre for their help.
I am assuming you are working in two dimensions?
Two dimensional vectors of position p and velocity v. The constant K is the the product of the gravitational constant and the mass of the gravity generating body. Calculate the eccentricity vector
eccVector = (dot(v, v)*p - dot(v, p)*v) / K - p / sqrt(dot(p, p));
eccentricity = sqrt(dot(eccVector, eccVector));
eccVector = eccVector / eccentricity;
b = { - eccVector.y, eccVector.x}; //unit vector perpendicular to eccVector
r = sqrt(dot(p, p));
cos_TA = dot(p, eccVector) / r; \\ cosine of true anomaly
sin_TA = dot(p, b) / r; \\ sine of true anomaly
if (sin_TA >= 0) {
trueAnomaly = arccos(cos_TA);
}
else if (sin_TA < 0){
trueAnomaly = 2*pi - arccos(cos_TA);
}

Calculating Lean Angle with core motion of iOS device

I am trying to calculate the lean angle/inclination of my iOS device.
I did a lot of research and found a way how to give me the most accurate lean angle/inclincation. I used quaternion to calculate it.
This is the code that I use to calculate it.
public void CalculateLeanAngle ()
{
motionManager = new CMMotionManager ();
motionManager.DeviceMotionUpdateInterval = 0.02; // 50 Hz
if (motionManager.DeviceMotionAvailable) {
motionManager.StartDeviceMotionUpdates (CMAttitudeReferenceFrame.XArbitraryZVertical, NSOperationQueue.CurrentQueue, (data, error) => {
CMQuaternion quat = motionManager.DeviceMotion.Attitude.Quaternion;
double x = quat.x;
double y = quat.y;
double w = quat.w;
double z = quat.z;
double degrees = 0.0;
//Roll
double roll = Math.Atan2 (2 * y * w - 2 * x * z, 1 - 2 * y * y - 2 * z * z);
Console.WriteLine("Roll: " + Math.Round(-roll * 180.0/Constants.M_PI));
degrees = Math.Round (-applyKalmanFiltering (roll) * 180.0 / Constants.M_PI);
string degreeStr = string.Concat (degrees.ToString (), "°");
this.LeanAngleLbl.Text = degreeStr;
});
}
public double applyKalmanFiltering (double yaw)
{
// kalman filtering
if (motionLastYaw == 0) {
motionLastYaw = yaw;
}
float q = 0.1f; // process noise
float r = 0.1f; // sensor noise
float p = 0.1f; // estimated error
float k = 0.5f; // kalman filter gain
double x = motionLastYaw;
p = p + q;
k = p / (p + r);
x = x + k * (yaw - x);
p = (1 - k) * p;
motionLastYaw = x;
return motionLastYaw;
}
This works perfect when you walk and tilt your device. But when I drive my car something happens that the quaternion isn't giving me the correct lean angle/inclincation. It just gives me 180°.. Or suddenly it shows me a totally wrong lean angle/inclination.. Like when I am standing still (at trafic lights) it shows me 23°... Then after driving a bit it works again or it shows again 180°.
Could it be that the quaternion is effected by the acceleration of my car? So that because my car is driving at a certain speed it isn't giving me the correct value?
Does anyone have any solution for this?
I would like to calculate my lean angle/inclincation when I drive my bike/car. So I really want to know how to calulate the right lean angle/inclincation independent if I drive my car/bike or not.
Thanks in advance!

Cubic Bezier reverse GetPoint equation: float for Vector <=> Vector for float

Is it possible to get float t back given the resulting value and the four points?
If so, how?
public static Vector3 GetPoint (Vector3 p0, Vector3 p1, Vector3 p2, Vector3 p3, float t) {
t = Mathf.Clamp01(t);
float oneMinusT = 1f - t;
return
oneMinusT * oneMinusT * oneMinusT * p0 +
3f * oneMinusT * oneMinusT * t * p1 +
3f * oneMinusT * t * t * p2 +
t * t * t * p3;
}
Code from this Tutorial by Jasper Flick
It is, and involves implementing root finding for third degree functions. One direct way of doing that is to implement Cardano's Algorithm for finding the roots for a polynomial of degree three - a JavaScript implementation of that can be found here. Depending on the curve's parameters, you will get up to three equally correcet answers, so depending on what you were trying to find the t value for, you'll have to do more work to find out which of those up-to-three values you need.
// Not in every toolbox, so: how to implement the cubic root
// equivalent of the sqrt function (note that there are actually
// three roots: one real, two complex, and we don't care about the latter):
function crt(v) { if (v<0) return -pow(-v,1/3); return pow(v,1/3); }
// Cardano's algorithm, based on
// http://www.trans4mind.com/personal_development/mathematics/polynomials/cubicAlgebra.htm
function cardano(curve, line) {
// align curve with the intersecting line, translating/rotating
// so that the first point becomes (0,0), and the last point
// ends up lying on the line we're trying to use as root-intersect.
var aligned = align(curve, line),
// rewrite from [a(1-t)^3 + 3bt(1-t)^2 + 3c(1-t)t^2 + dt^3] form...
pa = aligned[0].y,
pb = aligned[1].y,
pc = aligned[2].y,
pd = aligned[3].y,
// ...to [t^3 + at^2 + bt + c] form:
d = ( -pa + 3*pb - 3*pc + pd),
a = ( 3*pa - 6*pb + 3*pc) / d,
b = (-3*pa + 3*pb) / d,
c = pa / d,
// then, determine p and q:
p = (3*b - a*a)/3,
p3 = p/3,
q = (2*a*a*a - 9*a*b + 27*c)/27,
q2 = q/2,
// and determine the discriminant:
discriminant = q2*q2 + p3*p3*p3,
// and some reserved variables for later
u1,v1,x1,x2,x3;
// If the discriminant is negative, use polar coordinates
// to get around square roots of negative numbers
if (discriminant < 0) {
var mp3 = -p/3,
mp33 = mp3*mp3*mp3,
r = sqrt( mp33 ),
t = -q/(2*r),
// deal with IEEE rounding yielding <-1 or >1
cosphi = t<-1 ? -1 : t>1 ? 1 : t,
phi = acos(cosphi),
crtr = crt(r),
t1 = 2*crtr;
x1 = t1 * cos(phi/3) - a/3;
x2 = t1 * cos((phi+tau)/3) - a/3;
x3 = t1 * cos((phi+2*tau)/3) - a/3;
return [x1, x2, x3];
}
else if(discriminant === 0) {
u1 = q2 < 0 ? crt(-q2) : -crt(q2);
x1 = 2*u1-a/3;
x2 = -u1 - a/3;
return [x1,x2];
}
// one real root, and two imaginary roots
else {
var sd = sqrt(discriminant),
tt = -q2+sd;
u1 = crt(-q2+sd);
v1 = crt(q2+sd);
x1 = u1 - v1 - a/3;
return [x1];
}
}
Solving the following cubic polynomial should reveal your original t:
(p3 - (3 * p2) + (3 * p1) - p0) * t^3
+ ((3 * p2) - (6 * p1) + (3 * p0)) * t^2
+ ((3 * p1) - (3 * p0)) * t
+ p0
= 0
I put it in standard form so you could easily get the roots from the coefficients.

Storing 2 Single Floats in One

I've read several links discussing storing 2 or 3 floats in one float. Here's an example:
Storing two float values in a single float variable
and another:
http://uncommoncode.wordpress.com/2012/11/07/float-packing-in-shaders-encoding-multiple-components-in-one-float/
and yet another:
decode rgb value to single float without bit-shift in glsl
I've seen others but all of them use the same principle. If you want to encode x and y, they multiply y by some factor and then add x to it. Well this makes since on paper, but I don't understand how in the world it can work when stored to a floating value. Floating values only have 7 significant digits. If you add a big number and a small number, the small number is just truncated and lost. The precision only shows the value of the big number.
Since everyone seems to prescribe the same method, I tried it myself and it did exactly what I thought it would do. When I decoded the numbers, the number that wasn't multiplied turned out as 0.0. It was completely lost in the encoded float.
Here's an example of some MaxScript I tried to test it:
cp = 256.0 * 256.0
scaleFac = 16777215
for i = 1 to 20 do (
for j = 1 to 20 do (
x = (i as float / 20.01f) as float;
y = (j as float / 20.01f) as float;
xScaled = x * scaleFac;
yScaled = y * scaleFac;
f = (xScaled + yScaled * cp) as float
print ("x[" + xScaled as string + "] y[" + yScaled as string + "]" + " e[" + f as string + "]")
dy = floor(f / cp)
dx = (f - dy * cp)
print ("x[" + dx as string + "] y[" + dy as string + "]" + " e[" + f as string + "]")
)
)
dx is 0.0 everytime. Can anyone shed some light on this? NOTE: It doesn't matter whether I make cp = 128, 256, 512 or whatever. It still gives me the same types of results.
This method works for storing two integers. You're effectively converting your floating point numbers to large integers by multiplying by scaleFac, which is good, but it would be better to make it explicit with int(). Then you need to make sure of two things: cp is greater than the largest number you're working with (scaleFac), and the square of cp is small enough to fit into a floating point number without truncation (about 7 digits for a single precision float).
Here is a working code in C to pack two floats into one float and unpack them.
You should change scaleFactor and cp parameters as according to your possible value ranges (yourBiggestNumber * scaleFactor < cp). It is a precision battle. Try printing a few results to find good values for your case. The example below allows floats in [0 to 1) range.
#include <math.h>
/* yourBiggestNumber * scaleFactor < cp */
double scaleFactor = 65530.0;
double cp = 256.0 * 256.0;
/* packs given two floats into one float */
float pack_float(float x, float y) {
int x1 = (int) (x * scaleFactor);
int y1 = (int) (y * scaleFactor);
float f = (y1 * cp) + x1;
return f;
}
/* unpacks given float to two floats */
int unpack_float(float f, float* x, float* y){
double dy = floor(f / cp);
double dx = f - (dy * cp);
*y = (float) (dy / scaleFactor);
*x = (float) (dx / scaleFactor);
return 0;
}
It will work only if your individual floats are small enough to be packed into one float location.
So you can pack 2 numbers by "dividing" this into two to store 2 numbers that can be represented by half the space.
Here's the code I use for packing and unpacking floats. It works by packing first float (0..1) into the first four bytes of a 8-bit (0..256) number, and the next float into the remaining 4 bits. The resulting numbers have 16 possible combinations each (2^4). In some cases this is good enough:
private float PackFloatsInto8Bits( float v1, float v2 )
{
var a = Mathf.Round( v1 * 15f );
var b = Mathf.Round( v2 * 15f );
var bitShiftVector = new Vector2( 1f/( 255f/16f ), 1f/255f );
return Vector2.Dot( new Vector2( a, b ), bitShiftVector );
}
private Vector2 UnpackFloatsFrom8Bits( float input )
{
float temp = input * 15.9375f;
float a = Mathf.Floor(temp) / 15.0f;
float b = Frac( temp ) * 1.0667f;
return new Vector2(a, b);
}

Smallest difference between two angles?

I'm trying to calculate the smallest difference between two angles.
This is my current code (a slight variation of something I found online):
float a1 = MathHelper.ToDegrees(Rot);
float a2 = MathHelper.ToDegrees(m_fTargetRot);
float dif = (float)(Math.Abs(a1 - a2);
if (dif > 180)
dif = 360 - dif;
dif = MathHelper.ToRadians(dif);
It works fine except for in cases at the edge of a circle. For example if the current angle is 355 and the target angle is 5 it calculates the difference is -350 rather than 10 since 365 degrees is equal to 5 degrees.
Any ideas on what I can do to make this work?
You basically had it. Just take the dif modulus 360 before checking to see if greater than 180:
float a1 = MathHelper.ToDegrees(Rot);
float a2 = MathHelper.ToDegrees(m_fTargetRot);
float dif = (float)Math.Abs(a1 - a2) % 360;
if (dif > 180)
dif = 360 - dif;
dif = MathHelper.ToRadians(dif);
Edit: #Andrew Russell made a great point in comments to your question and the solution below takes advantage of the MathHelper.WrapAngle method as he suggested:
diff = Math.Abs(MathHelper.WrapAngle(a2 - a1));
You would expand the check for out of bound angles:
if (dif < 0) dif = dif + 360;
if (dif > 180) dif = 360 - dif;
I never like handling the zero-wrapping with case statements. Instead, I use the definition of the dot product to compute the (unsigned) angle between two angles:
vec(a) . vec(b) = ||a|| ||b|| cos(theta)
We're just going to make a and b unit vectors, so ||a|| == ||b|| == 1.
Since vec(x) = [cos(x),sin(x)], we get:
unsigned_angle_theta(a,b) = acos(cos(a)cos(b) + sin(a)sin(b))
(n.b. all angles in radians)
You can normalize the result to be 0 <= theta < 360:
while(theta < 0) { theta += 360; }
If you want to keep the answer in radians (recommended):
const Double TwoPi = 2 * Math.Pi;
while(theta < 0) { theta += TwoPi; }
We can use Euler's formula: exp(iA) = cos A + i sin A.
In the case of the difference between two angles this becomes:
exp(i(A-B))
Using the laws of exponents:
= exp(iA).exp(-iB).
-iB is the conjugate of iB thus:
= exp(iA).exp(conjugate(iB)).
The complex exponent can be calcuated by Taylors series:
taylor_e(P={cplx,_,_}) ->
taylor_e(
#{sum => to_complex(1),
term => 0,
term_value => to_complex(1),
min_terms => 3,
quadrature => 1,
error_term => 1.0e-4,
domain => P}
);
taylor_e(P=#{sum := Sum,
term := Term,
term_value := TermValue0,
min_terms := MinTerms,
domain := Dom,
quadrature := Q,
error_term := ErrorTerm
})
when ((Term =< MinTerms) or (abs(1-Q) > ErrorTerm)) and
(Term < 20) ->
NewTerm = Term+1,
TermValue1 = scalar_divide(multiply(TermValue0,Dom),NewTerm),
PartialSum = add(Sum,TermValue1),
taylor_e(P#{sum := PartialSum,
term := Term+1,
term_value := TermValue1,
quadrature := quadrance(PartialSum)
});
taylor_e(#{sum := Result}) ->
Result.
The angle difference is the the argument (direction) of the resulting complex number and is retrieved by atan2.
Of course you will need some basic complex number routines. This method does not have a discontinuity around 0/360 degrees, and the sign of the result gives the direction of turning. Where this is a difference between some reference direction (say in an autopilot) it only needs calculating once and then storing until a new course is chosen. The deviations from the course would need to be calculated from every sample however.

Categories

Resources