I am trying to segment arms from a Kinect depth image in my app (click for larger picture):
I tried using joint positions to get the vector between elbow and wrist/hand-tip, and created a 2D bounding rotated rectangle between these two joints, and then removed all pixels outside the rectangle. The problem is that, depending on the distance from the sensor, this rectangle changes width, and can become trapezoidal (e.g. if hand is closer to the camera), so it can basically only allow me to discard parts of the image before doing actual processing.
When the hand is near the body (like my left arm below), I need to detect the edge of the hand - presumably by checking the depth gradient. But I couldn't find a flood fill algorithm which "stops" at gradients.
Is there a better approach perhaps? I could use an algorithm idea.
Related
I have to detect all the points of a white polygon on a black background in c#. Here is an image of several examples. I wouldn't think it is too difficult, but I am unable to detect this properly with all the variations. My code is too much to post here, but basically I went through each side and look for when it changes from black and white. Should I use Open CV? I was hoping for a simple algorithm I could implement in C#. Any suggestions? Thank you.
In your case I would do this:
pre process image
so remove noise in color if present (like JPG distortion etc) and binarize image.
select circumference pixels
simply loop through all pixels and set each white pixel that has at least one black neighbor to distinct color that will represent your circumference ROI mask or add the pixel position to some list of points instead.
apply connected components analysis
so you need to find out the order of the points (how are connected together). The easiest way to do this is use flood filing of the ROI from first found pixel until all ROI is filled and remember the order of filled points (similar to A*). There should be 2 distinct paths at some point and both should join at last. So identify these 2 points and construct the circumference point order (by reversing one half and handling the shared part if present).
find vertexes
if you compute the angle change between all consequent pixels then on straight lines the angle change should be near zero and near vertexes much bigger. So threshold that and you got your vertexes. To make this robust you need to compute slope angle from a bit more distant pixels then the closest pixels. Also thresholding this angle change against sliding average often provides more stable results.
So find out how far the pixels should be to compute angle so you got not too big noise and vertexes has still big peaks and also find out the threshold value that is safe above any noise.
This can be done also by hough transform and or find contours functions that are present in many CV libs. Another option is also regress/fit the lines in the point list directly and compute intersections which can provide sub pixel precision.
For more info see related QAs:
Backtracking in A star
Finding holes in 2d point sets
growth fill
My current project has required me to learn face detection/tracking and image processing, given my experience in c#, I chose Emgu CV as my choice library for face detection and tracking. From what I've learned so far, I can do face detection and tracking, and basic image processing.
My goal is to be able to place virtual hair on the detected face. What I want to achieve is similar to [this video]: http://www.youtube.com/watch?v=BdPmECfUFcI.
What I would like to know is the technique(s) to use in handling hair placement for different kind of hairstyles on the detected face. In what image format do I store the the hair?
After watching the video I noticed it considers the head as a flat rectangle and not as a rectangular prism (the 3D object), so it doesn't consider the use of perspective transformations and I will not consider it too. This is a limitation but serves as a decent first step in doing such placements. Note that it is not a simply matter of taking perspective into consideration, your face tracking algorithm also needs to be able to handle more complicated configurations (the eyes might not be fully visible, for example).
So, the first thing you want is a bounding rectangle aligned according to the angle the eyes make with the x axis, illustrated in the following right figure (the red segment indicates the connection between the eyes). The left figure shows a typical bounding box aligned to the axis, which doesn't serve for this problem.
The problem is also simplified after you consider the head is symmetric, so you know the top middle point in the above figure is the middle of the top of your head. Also, considering that a typical head will likely be larger at top than at bottom, then you have something like in the following figure where the width of the rectangle is close to the width of the forehead. You could also consider a bounding rectangle on only upper half of the head, for example.
Now all that is left is positioning some object in this rectangle. For that, you need to augment the description of this object to be positioned so it is not purely pixels. We can define "entrance width" (EW) and "entrance middle point" (EM). This EW establishes the width needed in the other rectangle (the head one) to position it. So, if EW is smaller than the needed value, you upscale this object, respectively for when EW is larger. Note that the full width of the head's rectangle is usually an overestimation to position this object, so you can experiment with percentages of the width. The EM value is useful to know how you will position this object over the head. In the following figure, EW is the horizontal blue dashed horizontal, and EM is the middle point on it. The vertical blue line indicates how much over the EM you want to move this object inside the top segment of head's rectangle.
The only other special thing this object needs is a value that is considered as background. So when painting this object it is easy to know whether to make a point fully transparent (the background value) or fully opaque (anything else). This was the sketch I had in mind of what needs to be basically done.
Kinect: How to draw bones with PNG picture instead of DrawLine?
I want the result like this http://www.hotzehwc.com/Resource-Center/Wellness-101/skeleton2.aspx
I will get the joints positions from Kinect.
JointA.x;
JointA.y;
JointB.x;
JointB.y;
The joints positions will change, so the PNG connects between two joints needs to resize and rotate.
Any sample code to make this easier?
Ideally you would want to use DrawLine, and other internal draw functions, so that you can scale your bones appropriately. It's just a lot harder to get them to look right at first.
Using images you would want to cut them up into their individual pieces. The Kinect has a series of joints which the connecting lines would be the bones. First check out the SkeletonBasics-WPF example from the SDK Toolkit, supplied by Microsoft -- it will show you have they construct bones between the joints.
Now, you want to cut up your skeleton image in such a way that you have 1 bone per image. Create an Image object in your XAML for each image. Figure out where the joints belong in your images -- the elbow, for example, will be close to the bottom of the humerus image, but might be a few pixels into the image, and would towards the middle (width wise).
When you get the joint positions from the skeleton, translate the appropriate coordinates from the images into those positions. It is going to be a lot of math! You'll get the joints for the given bone and then calculate how to translate the bone image to the correct position and angle.
Is it possible to retrieve the texture coordinates of an object, for example through hittesting?
As an example: I use a 1920x1080 texture on a simple plane, and I want to get the coordinates 1920, 1080 if I click in the right bottom. (The model is in reality slightly more complex, so trying to calculate the position via math isn't as easy)
When math does not work for some reasons, I used to do the following graphic hit-test: assign unique colors to each texel of your plane, then do one frame rendering to an offscreen surface with lighthing and effects disabled, then read pixel color under the cursor and translate its value back to coordinates. This is quite efficient on complex models when you don't need to do such lookups too often (say, games), because reading pixels back will stop graphics hardware pipeline and drain the performance. Also, this potentially would work with any projections: ortho or perspective.
Here's the setup: This is for an ecommerce art site where some paintings are canvas transfers. The painting wraps around the sides and top and bottom of the canvas. We have high-res images of the entire painting, but what we want to display is a quasi-3D representation of the image in which you can see how the sides of the painting wrap around the canvas. Here's a rough sketch of what I'm talking about:
My question is, how can I rotate an image in 3D space? The approach I think I'd like to take, is to cut off a portion of the top and side of the image, and rotate then in 3D and then stich it back on to the top and side to give it the 3D look. How do I go about about doing that? It can be done using any .Net technology (GDI+, WPF etc.).
In WPF using the ViewPort3D class you can create a cuboid which is 8x5x1 units. Create the image as a texture and then apply the texture to the front face (8x5) and the side faces (5x1) and the top and bottom faces (8x1) using texture coordinates. The front face coordinates should be: (1/9, 1/6), (8/9, 1/6), (1/9, 5/6) and (8/9, 5/6) for the front face, and from the nearest edge to those coordinates for the sides, e.g. for the left side: (0, 1/6), (1/9, 1/6), (0, 5/6) and (1/9, 5/6) for the left side.
Edit:
If you then want to be able to perform rotations on the 3D canvas model you can follow the advice here:
How can I do 3D transformation in WPF?
It looks like you're not needing to do real 3D, but only needing to fake it.
Chop off four strips along the top, bottom, left and right of the image. Toss the bottom and right (going by your sketch in the question). Scale and shear the strips (I'm not expert enough at .net/wpf to know how, but it can do it). The top would be scaled vertically by a factor of 0.5 (a guess - choose to fit the desired final 3D-looking image) and sheared horizontally. The result is composited onto the output image as the top side of the canvas. The left strip would be scaled horizontally and sheared vertically.
If the end user is to view the 3D canvas from different angles interactively, this method is probably faster than rendering an honest 3D model, which would have to do texture mapping and rasterizing the model into a final image, which amounts to doing the same math. The fun part is figuring out how to adjust the scaling and shearing parameters.
This page might be educational: http://www.idomaths.com/linear_transformation.php
and this could be useful reference http://en.csharp-online.net/GDIplus_Graphics_Transformation%E2%80%94Image_Transformation
I dont have any experience in this kind of stuff. But when i saw this question, the first thing comes to my mind is the funny Unicornify for SO.
In this making of article by balpha, he explained how the 2d unicorn sphere is rotated in 3d space.
But the code is written in python. If you are interested, you can take a look into that. But am not exactly sure this would help you.
The brute force approach (which might be the easiest approach), is to map the u,v texture coordinates for each of the three faces, onto three billboards representing three sides of the canvas (a billboard is just two triangles that make a rectangle). Then, rotate the whole canvas (all three billboards) using matrix transforms. Tada!
Alternately, you can move the 3-space camera position with a transform, rather than the canvas. Six of one, half the other - as they say.