I was working on a open source app of a Kinect camera ,
and I faced a problem while I read the source .
By the way the project idea is for controlling PowerPoint using hands, you can find the source code here.
The author uses this code:
Skeleton closestSkeleton = skeletons.Where(s => s.TrackingState == SkeletonTrackingState.Tracked)
.OrderBy(s => s.Position.Z * Math.Abs(s.Position.X))
.FirstOrDefault();
Can any one help me figure out what s => s.Position.Z * Math.Abs(s.Position.X)
means as an idea, I know it's a lambda expression so I only need to figure out why?
It's a distance metric, used to determine the Skeleton closest to the Kinect sensor.
In the Skeleton Space, Z is the distance from the Kinect sensor (see here).
And if you think of the room being divided in a left half and a right half, by a line from the Kinect sensor.... then X is how far away something is from that line. How far to the left or to the right.
This is also why the absolute value of X is used - the code looks how far away the Skeleton is from that hypothetical dividing line.
So this code looks how far away from the sensor a body is (Z), then multiplies it by how far to the left or right (X). It is a somewhat primitive determination of distance. (One would have expected to use the Pythagorean theorem, but maybe that was considered too slow?)
The code takes the FirstOrDefault Skeleton, where these Skeletons are ordered by this distance metric.
s => s.Position.Z * Math.Abs(s.Position.X) is in the OrderBy statement, serving as the quantity by which to order all detected bodies. It is weighting the skeletons sort by radial distance and not just orthogonal Z separation.
Consider two objects at the same z coordinate, and the camera at the origin. The closest one is the one with a smaller horizontal (x) distance.
I need a function so that, when given the Vector3 for a, b and c, will give me a new Vector3, the rotation of the Triangle. Pretty much, for point d, if I want to move it out, adjacent to the triangle, I just have to multiply the distance I want to move it by the Vector3 rotation, and add the old position to get the new Location.
The vector you want is called the unit normal vector. "Unit" means the length is 1 (so that you can just multiply by distance), and "normal" is the name of the vector that's perpendicular to a surface.
To get it, take the cross-product of any two edges of your triangle, and normalize the result. Look at this question for details on how to do this mathematically.
Note: "Normalizing" a vector means to keep the direction the same, but change the length to 1. It doesn't directly relate to a "normal vector".
I am trying to minimize the difference between sets of square markers in 3d space with a set of unknown parameters.
I have a model set of these square markers (represented by 3d position and rotation) which should at the end of optimization match up with a set of observed square markers.
I am using Levenberg–Marquardt to optimize the set of unknown parameters, these parameters will alter the position and rotation of the model 3d markers until they match (more or less) with the observed 3d marker positions.
The observed 3d markers come from a computer vision marker detection algorithm. It gives the id of the markers seen in each frame and the transformation from the camera of each marker (using Coplanar posit). Each 'frame' would only be able to see a small number of markers in the total set of markers, there will also be inaccuracies in the transformation.
I have thought of how to construct my minimization function and I thought to try to compare the relative rotations and minimize the difference between the rotations in each iteration of the LM optimisation.
Essentially:
foreach (Marker m1 in markers)
{
foreach (Marker m2 in markers)
{
Vector3 eulerRotation = getRotation(m1, m2);
ObservedMarker observed1 = getMatchingObserved(m1);
ObservedMarker observed2 = getMatchingObserved(m2);
Vector3 eulerRotationObserved = getRotation(observed1, observed2);
double diffX = Math.Abs(eulerRotation.X - eulerRotationObserved.X);
double diffY = Math.Abs(eulerRotation.Y - eulerRotationObserved.Y);
double diffZ = Math.Abs(eulerRotation.Z - eulerRotationObserved.Z);
}
}
Where diffX, diffY and diffZ are the values to be minimized.
I am using the following to calculate the angles:
Vector3 axis = Vector3.Cross(getNormal(m1), getNormal(m2));
axis.Normalize();
double angle = Math.Acos(Vector3.Dot(getNormal(m1), getNormal(m2)));
Vector3 modelRotation = calculateEulerAngle(axis, angle);
getNormal(Marker m) calculates the normal to the plane that the square marker lies on.
I am sure I am doing something wrong here though. Throwing this all into the LM optimiser (I am using ALGLib) doesn't seem to do anything, it goes through 1 iteration and finishes without changing any of the unknown parameters (initially all 0).
I am thinking that something is wrong with the function I am trying to minimize over. It seems sometimes the angle calculated (3rd line) returns NaN (I am currently setting this case to return diffX, diffY, diffZ as 0). Is it even valid to compare the euler angles as above?
Any help would be greatly appreciated.
Further information:
Program is written in C#, I am using XNA as well.
The model markers are represented by its four corners in 3D coords
All the model markers are in the same coordinate space.
Observed markers are the four corners as translations from the camera position in camera coordinate space
If m1 and m2 markers are the same marker id or if either m1 or m2 is not observed, I set all the diffs to 0 (no difference).
At first I thought this might be a typo, but then I realized that this could be a bug, having been a victim of similar cases myself in the past.
Shouldn't diffY and diffZ be:
double diffY = Math.Abs(eulerRotation.Y - eulerRotationObserved.Y);
double diffZ = Math.Abs(eulerRotation.Z - eulerRotationObserved.Z);
I don't have enough reputation to post this as a comment, hence posting it as an answer!
Any luck with this? Is it correct to assume that you want to minimize the "sum" of all diffs over all marker combinations? I think if you want to use LM you should not use Math.Abs.
One alternative would be to formulate your objective function manually and use another optimizer. I have recently ported two non-linear optimizers to C# which do not even require you to compute derivatives:
COBYLA2, supports non-linear constraints but require more iterations.
BOBYQA, limited to variable bounds constraints, but provides a considerable more efficient iteration scheme.
I'm trying to let the user draw a paddle that they can then use to hit a ball. However, I cannot seem to get the ball to bounce correctly because the x and y components of the ball's velocity are not lined up with the wall. How can I get around this?
I tried the advice given by Gareth Rees here, but apparently I don't know enough about vectors to be able to follow it. For example, I don't know what exactly you store in a vector - I know it's a value with direction, but do you store the 2 points it's between, the slope, the angle?
What I really need is given the angle of the wall and the x and y velocities as the ball hits, to find the new x and y velocities afterwards.
Gareth Rees got the formula correct, but I find the pictures and explanation here a little more clear. That is, the basic formula is:
Vnew = -2*(V dot N)*N + V
where
V = Incoming Velocity Vector
N = The Normal Vector of the wall
Since you're not familiar with vector notation, here's what you need to know for this formula: Vectors are basically just x,y pairs, so V = (v.x, v.y) and N = (n.x, n.y). Planes are best described by the normal to the plane, that is a vector of unit length that is perpendicular to the plane. Then a few formula, b*V = (b*v.x, b*v.y); V dot N = v.x*n.x+v.y*n.y, that is, it's a scalar; and A + B = (a.x+b.x, a.y+b.y). Finally, to find a unit vector based on an arbitrary vector, it's N = M/sqrt(M dot M).
If the surface is curved, use the normal at the point of contact.
Let's say I have a data structure like the following:
Camera {
double x, y, z
/** ideally the camera angle is positioned to aim at the 0,0,0 point */
double angleX, angleY, angleZ;
}
SomePointIn3DSpace {
double x, y, z
}
ScreenData {
/** Convert from some point 3d space to 2d space, end up with x, y */
int x_screenPositionOfPt, y_screenPositionOfPt
double zFar = 100;
int width=640, height=480
}
...
Without screen clipping or much of anything else, how would I calculate the screen x,y position of some point given some 3d point in space. I want to project that 3d point onto the 2d screen.
Camera.x = 0
Camera.y = 10;
Camera.z = -10;
/** ideally, I want the camera to point at the ground at 3d space 0,0,0 */
Camera.angleX = ???;
Camera.angleY = ????
Camera.angleZ = ????;
SomePointIn3DSpace.x = 5;
SomePointIn3DSpace.y = 5;
SomePointIn3DSpace.z = 5;
ScreenData.x and y is the screen x position of the 3d point in space. How do I calculate those values?
I could possibly use the equations found here, but I don't understand how the screen width/height comes into play. Also, I don't understand in the wiki entry what is the viewer's position vers the camera position.
http://en.wikipedia.org/wiki/3D_projection
The 'way it's done' is to use homogenous transformations and coordinates. You take a point in space and:
Position it relative to the camera using the model matrix.
Project it either orthographically or in perspective using the projection matrix.
Apply the viewport trnasformation to place it on the screen.
This gets pretty vague, but I'll try and cover the important bits and leave some of it to you. I assume you understand the basics of matrix math :).
Homogenous Vectors, Points, Transformations
In 3D, a homogenous point would be a column matrix of the form [x, y, z, 1]. The final component is 'w', a scaling factor, which for vectors is 0: this has the effect that you can't translate vectors, which is mathematically correct. We won't go there, we're talking points.
Homogenous transformations are 4x4 matrices, used because they allow translation to be represented as a matrix multiplication, rather than an addition, which is nice and quick for your videocard. Also convenient because we can represent successive transformations by multiplying them together. We apply transformations to points by performing transformation * point.
There are 3 primary homogeneous transformations:
Translation,
Rotation, and
Scaling.
There are others, notably the 'look at' transformation, which are worth exploring. However, I just wanted to give a brief list and a few links. Successive application of moving, scaling and rotating applied to points is collectively the model transformation matrix, and places them in the scene, relative to the camera. It's important to realise what we're doing is akin to moving objects around the camera, not the other way around.
Orthographic and Perspective
To transform from world coordinates into screen coordinates, you would first use a projection matrix, which commonly, come in two flavors:
Orthographic, commonly used for 2D and CAD.
Perspective, good for games and 3D environments.
An orthographic projection matrix is constructed as follows:
Where parameters include:
Top: The Y coordinate of the top edge of visible space.
Bottom: The Y coordinate of the bottom edge of the visible space.
Left: The X coordinate of the left edge of the visible space.
Right: The X coordinate of the right edge of the visible space.
I think that's pretty simple. What you establish is an area of space that is going to appear on the screen, which you can clip against. It's simple here, because the area of space visible is a rectangle. Clipping in perspective is more complicated because the area which appears on screen or the viewing volume, is a frustrum.
If you're having a hard time with the wikipedia on perspective projection, Here's the code to build a suitable matrix, courtesy of geeks3D
void BuildPerspProjMat(float *m, float fov, float aspect,
float znear, float zfar)
{
float xymax = znear * tan(fov * PI_OVER_360);
float ymin = -xymax;
float xmin = -xymax;
float width = xymax - xmin;
float height = xymax - ymin;
float depth = zfar - znear;
float q = -(zfar + znear) / depth;
float qn = -2 * (zfar * znear) / depth;
float w = 2 * znear / width;
w = w / aspect;
float h = 2 * znear / height;
m[0] = w;
m[1] = 0;
m[2] = 0;
m[3] = 0;
m[4] = 0;
m[5] = h;
m[6] = 0;
m[7] = 0;
m[8] = 0;
m[9] = 0;
m[10] = q;
m[11] = -1;
m[12] = 0;
m[13] = 0;
m[14] = qn;
m[15] = 0;
}
Variables are:
fov: Field of view, pi/4 radians is a good value.
aspect: Ratio of height to width.
znear, zfar: used for clipping, I'll ignore these.
and the matrix generated is column major, indexed as follows in the above code:
0 4 8 12
1 5 9 13
2 6 10 14
3 7 11 15
Viewport Transformation, Screen Coordinates
Both of these transformations require another matrix matrix to put things in screen coordinates, called the viewport transformation. That's described here, I won't cover it (it's dead simple).
Thus, for a point p, we would:
Perform model transformation matrix * p, resulting in pm.
Perform projection matrix * pm, resulting in pp.
Clipping pp against the viewing volume.
Perform viewport transformation matrix * pp, resulting is ps: point on screen.
Summary
I hope that covers most of it. There are holes in the above and it's vague in places, post any questions below. This subject is usually worthy of a whole chapter in a textbook, I've done my best to distill the process, hopefully to your advantage!
I linked to this above, but I strongly suggest you read this, and download the binary. It's an excellent tool to further your understanding of theses transformations and how it gets points on the screen:
http://www.songho.ca/opengl/gl_transform.html
As far as actual work, you'll need to implement a 4x4 matrix class for homogeneous transformations as well as a homogeneous point class you can multiply against it to apply transformations (remember, [x, y, z, 1]). You'll need to generate the transformations as described above and in the links. It's not all that difficult once you understand the procedure. Best of luck :).
#BerlinBrown just as a general comment, you ought not to store your camera rotation as X,Y,Z angles, as this can lead to an ambiguity.
For instance, x=60degrees is the same as -300 degrees. When using x,y and z the number of ambiguous possibilities are very high.
Instead, try using two points in 3D space, x1,y1,z1 for camera location and x2,y2,z2 for camera "target". The angles can be backward computed to/from the location/target but in my opinion this is not recommended. Using a camera location/target allows you to construct a "LookAt" vector which is a unit vector in the direction of the camera (v'). From this you can also construct a LookAt matrix which is a 4x4 matrix used to project objects in 3D space to pixels in 2D space.
Please see this related question, where I discuss how to compute a vector R, which is in the plane orthogonal to the camera.
Given a vector of your camera to target, v = xi, yj, zk
Normalise the vector, v' = xi, yj, zk / sqrt(xi^2 + yj^2 + zk^2)
Let U = global world up vector u = 0, 0, 1
Then we can compute R = Horizontal Vector that is parallel to the camera's view direction R = v' ^ U,
where ^ is the cross product, given by
a ^ b = (a2b3 - a3b2)i + (a3b1 - a1b3)j + (a1b2 - a2b1)k
This will give you a vector that looks like this.
This could be of use for your question, as once you have the LookAt Vector v', the orthogonal vector R you can start to project from the point in 3D space onto the camera's plane.
Basically all these 3D manipulation problems boil down to transforming a point in world space to local space, where the local x,y,z axes are in orientation with the camera. Does that make sense? So if you have a point, Q=x,y,z and you know R and v' (camera axes) then you can project it to the "screen" using simple vector manipulations. The angles involved can be found out using the dot product operator on Vectors.
Following the wikipedia, first calculate "d":
http://upload.wikimedia.org/wikipedia/en/math/6/0/b/60b64ec331ba2493a2b93e8829e864b6.png
In order to do this, build up those matrices in your code. The mappings from your examples to their variables:
θ = Camera.angle*
a = SomePointIn3DSpace
c = Camera.x | y | z
Or, just do the equations separately without using matrices, your choice:
http://upload.wikimedia.org/wikipedia/en/math/1/c/8/1c89722619b756d05adb4ea38ee6f62b.png
Now we calculate "b", a 2D point:
http://upload.wikimedia.org/wikipedia/en/math/2/5/6/256a0e12b8e6cc7cd71fa9495c0c3668.png
In this case ex and ey are the viewer's position, I believe in most graphics systems half the screen size (0.5) is used to make (0, 0) the center of the screen by default, but you could use any value (play around). ez is where the field of view comes into play. That's the one thing you were missing. Choose a fov angle and calculate ez as:
ez = 1 / tan(fov / 2)
Finally, to get bx and by to actual pixels, you have to scale by a factor related to the screen size. For example, if b maps from (0, 0) to (1, 1) you could just scale x by 1920 and y by 1080 for a 1920 x 1080 display. That way any screen size will show the same thing. There are of course many other factors involved in an actual 3D graphics system but this is the basic version.
Converting points in 3D-space into a 2D point on a screen is simply made by using a matrix. Use a matrix to calculate the screen position of your point, this saves you a lot of work.
When working with cameras you should consider using a look-at-matrix and multiply the look at matrix with your projection matrix.
Assuming the camera is at (0, 0, 0) and pointed straight ahead, the equations would be:
ScreenData.x = SomePointIn3DSpace.x / SomePointIn3DSpace.z * constant;
ScreenData.y = SomePointIn3DSpace.y / SomePointIn3DSpace.z * constant;
where "constant" is some positive value. Setting it to the screen width in pixels usually gives good results. If you set it higher then the scene will look more "zoomed-in", and vice-versa.
If you want the camera to be at a different position or angle, then you will need to move and rotate the scene so that the camera is at (0, 0, 0) and pointed straight ahead, and then you can use the equations above.
You are basically computing the point of intersection between a line that goes through the camera and the 3D point, and a vertical plane that is floating a little bit in front of the camera.
You might be interested in just seeing how GLUT does it behind the scenes. All of these methods have similar documentation that shows the math that goes into them.
The three first lectures from UCSD might be very helful, and contain several illustrations on this topic, which as far as I can see is what you are really after.
Run it thru a ray tracer:
Ray Tracer in C# - Some of the objects he has will look familiar to you ;-)
And just for kicks a LINQ version.
I'm not sure what the greater purpose of your app is (you should tell us, it might spark better ideas), but while it is clear that projection and ray tracing are different problem sets, they have a ton of overlap.
If your app is just trying to draw the entire scene, this would be great.
Solving problem #1: Obscured points won't be projected.
Solution: Though I didn't see anything about opacity or transparency on the blog page, you could probably add these properties and code to process one ray that bounced off (as normal) and one that continued on (for the 'transparency').
Solving problem #2: Projecting a single pixel will require a costly full-image tracing of all pixels.
Obviously if you just want to draw the objects, use the ray tracer for what it's for! But if you want to look up thousands of pixels in the image, from random parts of random objects (why?), doing a full ray-trace for each request would be a huge performance dog.
Fortunately, with more tweaking of his code, you might be able to do one ray-tracing up front (with transparancy), and cache the results until the objects change.
If you're not familiar to ray tracing, read the blog entry - I think it explains how things really work backwards from each 2D pixel, to the objects, then the lights, which determines the pixel value.
You can add code so as intersections with objects are made, you are building lists indexed by intersected points of the objects, with the item being the current 2d pixel being traced.
Then when you want to project a point, go to that object's list, find the nearest point to the one you want to project, and look up the 2d pixel you care about. The math would be far more minimal than the equations in your articles. Unfortunately, using for example a dictionary of your object+point structure mapping to 2d pixels, I am not sure how to find the closest point on an object without running through the entire list of mapped points. Although that wouldn't be the slowest thing in the world and you could probably figure it out, I just don't have the time to think about it. Anyone?
good luck!
"Also, I don't understand in the wiki entry what is the viewer's position vers the camera position" ... I'm 99% sure this is the same thing.
You want to transform your scene with a matrix similar to OpenGL's gluLookAt and then calculate the projection using a projection matrix similar to OpenGL's gluPerspective.
You could try to just calculate the matrices and do the multiplication in software.