We're making a sign language translator using kinect 1.0 device for my undergrad final year project.
So far we have achieved recognizing gestures in 2D using the skeleton api's in kinect sdk and applied DTW algorithm on it.
We also tracked fingers and distinguished between how many fingers are shown in the frame using contouring and applying convex hull on the contour. We used C# and Emgucv to achieve this.
Now we're stuck at how to transform the data into 3d coordniates. What I don't get is that:
How the 3d visualization will look like? I mean for now we just use the depth stream and apply a skin classifier on it to show only the skin parts as white pixels and the rest of the objects as black pixels, and we show the contoured and convex hulled area in the color stream. For 3d we'll use the same depth and color stream? If yes then how we'll transform the data and coordinates into 3d?
For gestures that involve touching of fingers on nose, how will I isolate the contoured area not to include all of the face and just to tell which finger touches which side of nose? Is this where 3d will come in?
Which api's and libraries are there that can help us in c#?
Extracted Fingers after Contouring and Convex Hull
Kinect has support for creating a depth map using infrared lasers. It projects an infrared grid and measures the distance for each point in the grid. It seems that you're already using the depth info by this grid.
For converting to 3D you should indeed use the depth info. Some basic trigonometry will help to transform the depth map into 3D (x,y,z) coordinates. The color stream from the camera can be mapped onto these points.
Detecting whether fingers are touching a nose is a difficult issue. While the grid density of the kinect is not very high, 3D probably won't help you. I would suggest to use edge detection (e.g. canny algorithm) with contour recognition on the camera images to detect whether a finger is in front of the face. Testing whether a finger actually touches the nose or is just close to is the real challenge.
Related
I have an idea for a 2D game that uses 3D models that are cut at the center of the Z axis. However, I have found no way to properly get just the intersection to show.
Culling the front and back of the camera to be extremely thin leaves a small outline of the model, since there's no way for it to render the inside. I could use ray marching, but it would limit me to primitive objects and seems a little overkill for what I need. I've found a lot of cross-section shaders online, but they require the back of the camera to not be culled.
Below is a visual example of what I need (demonstrated with Blender's boolean tool rather than a shader), any help is appreciated as I've been looking for a way to do this for months.
Before Intersection vs After Intersection
I'm the developer on a game which uses gesture recognition with the HTC Vive roomscale VR headset, and I'm trying to improve the accuracy of our gesture recognition.
(The game, for context: http://store.steampowered.com//app/488760 . It's a game where you cast spells by drawing symbols in the air.)
Currently I'm using the 1 dollar algorithm for 2D gesture recognition, and using an orthographic camera tied to the player's horizontal rotation to flatten the gesture the player draws in space.
However, I'm sure there must be better approaches to the problem!
I have to represent the gestures in 2D in instructions, so ideally I'd like to:
Find the optimal vector on which to flatten the gesture.
Flatten it into 2D space.
Use the best gesture recognition algorithm to recognise what gesture it is.
It would be really good to get close to 100% accuracy under all circumstances. Currently, for example, the game tends to get confused when players try to draw a circle in the heat of battle, and it assumes they're drawing a Z shape instead.
All suggestions welcomed. Thanks in advance.
Believe me or not, but I found this post two months ago and decided to test my VR/AI skills by preparing a Unity package intended for recognising magic gestures in VR. Now, I'm back with a complete VR demo: https://ravingbots.itch.io/vr-magic-gestures-ai
The recognition system tracks a gesture vector and then projects it onto a 2D grid. You can also setup a 3D grid very easily if you want the system to work with 3D shapes, but don't forget to provide a proper training set capturing a large number of shape variations.
Of course, the package is universal, and you can use it for a non-magical application as well. The code is well documented. Online documentation rendered to a PDF has 1000+ pages: https://files.ravingbots.com/docs/vr-magic-gestures-ai/
The package was tested with HTC Vive. Support for Gear VR and other VR devices is progressively added.
Seems to me this plugin called Gesture Recognizer 3.0 could give you a great insight on what step you should take
Gesture Recognizer 3.0
Also, I found this javascript gesture recognition lib in github
Jester
Hope it helps.
Personally I recommend AirSig
It covers more features like authentication using controllers.
The Vive version and Oculus version are free.
"It would be really good to get close to 100% accuracy under all circumstances." My experience is its built-in gestures is over 90% accuracy, signature part is over 96%. Hope it fits your requirement.
In my (limited) experiences on 3D programming, usually we set up a 3D model with materials and texture, then set up the light and camera. Finally we can get a 2D view through the camera.
But I need to reverse this procedure: given a 2D view image, a camera setup, and a 3D model without texture, I wanted to find the texture for the model such that it results in the same 2D view. To simplify we ignore the light and materials, assuming they are even.
Although not easy, I think I can write a program to do this. But are there any existing wheels out there so I don't have to invent it again? (C#, WPF 3D or openCV)
Helix3d Toolkit for WPF has an interesting example called "ContourDemo". If you download the whole source you get a very comprehensive example app showcasing its capabilities.
This particular example uses a number of helper methods to generate a contour mesh from a given 3D model file(.3ds, .obj, .stl).
With some extending this could be the basis of reverse calculating the uv mapping, possibly.
Even if there is nothing suitable to perform the core requirement (extracting the texture) it is a great toolkit for displaying your original files and any outputs you have generated generated.
I have a project idea that check web usability using eye tracking. for that I needed to predict the focusing point on the screen(i.e. pixel points on screen) in specific time gap(0.5 second).
Here is some additional information:
I intended to use openCV or emguCV but it causing me a bit of trouble beacuse of my inexperience with OpenCV.
I am planning to "flatten" the eye so it appears to move on a plane. The obvious choice is to calibrate the camera to try to remove the radial distortion.
During the calibartion process the user looks at the corners of a grid on a screen. The moments of the pupil are stored in a Mat for each position during the calibration. So I have an image with the dots corresponding to a number of eye postions when looking at corners of a grid on the screen.
is there any article or example I can refer to get a good idea about this scenario and openCV eye prediction??
Thanks!
Different methods of camera calibration are possible and (similar/as like to corner dots method)
Thesis work on Eye Gaze using C++ and openCV and this should for sure help you.You can find some opencv based c++ scripts as well.
FYI :
Some works are presented where it claims Eye Gaze without calibration
Calibration-Free Gaze Tracking an experimental analysis by Maximilian Möllers
Free head motion eye gaze tracking without calibration
[I am restricted to post less than 2 reference links]
To get precise locations of eyes, you need first to calibrate your camera, using Chessboard approach or some other tools. Then you need to undistort the image, if it is not straight.
OpenCV comes already with an eye detector (Haar classifier, refer to eye.xml file), so you can already locate it and track it easily.
Besides that, there's only math to help you matching the discovered eye to the location it is looking at.
Currently i am working on kinect Virtual Jewel shop app. In which user can able to choose the jewels and check how it looks .
The App started with 2d images where it does not look realistic .
so can any one suggest your ideas for the following queries.
How to make the 2D images realistic without going forward for 3D?
I choose 3d for fitting the jewels with neck , we can skew and rotate the images in 3D. Whether the same thing can be accomplished by 2d? if so how can we do it.
For going forward with 3d is there any tool available to convert the 2D images into 3D?
I am zero and new to 3D objects, pls tell me how we can sit the 3D object in skeleton data. is there any format available in 3D ?
The current application is in wpf 4.0 so How can we use the 3D object in WPF ??
I don't think it can look realistic without 3D. But you can generate a 3D mesh with the form of the jewel.
(See this to get the idea http://fenicsproject.org/_images/hollow_cylinder.png ).
And put your 2D images like a texture on the mesh.
Check the riemers tutorials.
http://www.riemers.net/eng/Tutorials/XNA/Csharp/Series2/Textures.php