I have a set of 3D points mapped onto [0, 1] segments. These points represent simple gestures like circles, waving etc. Now I want to use Hidden Markov Models to recognize my gestures. First step is to extract features for (X, Y, Z) data. I tried to search something useful and found a couple examples: SIFT, SURF, some kind of Fast Fourier Transform etc.
I'm confused which one I should use in my project. I want to recognize gestures using data from Kinect controller, so I don't need to track joints algorithmically.
I had to implement HMM for gesture recognition a year or two ago for a paper on different Machine Learning methods. I came across Accord .NET Framework which helps implement many of those I was looking into, including HMM. It's fairly easy to use and its creator is active on the forums.
To train the HMM I created a Kinect application that would start recording a gesture once a body part was stationary for 3 seconds, it would then record all the points to an output file until said part stopped for 3 seconds again. I then selected the best attempts at the gestures I wanted to train and used them as my training set.
If you are new to Kinect Gesture Recognition and don't need to use HMM I would suggest maybe looking into Template Matching as it's a lot simpler and I found it can be very effective for simple gestures.
I'm working on a similar problem. So far the best material, that I have found is Kinect Toolbox from David Catuhe. Has some basic code for gesture recognition, Kinect data recording and replay.
You can start reading here: http://blogs.msdn.com/b/eternalcoding/archive/2011/07/04/gestures-and-tools-for-kinect.aspx
Have you considered a [trained] Support Vector Machine?
See: LibSVN Library http://www.csie.ntu.edu.tw/~cjlin/libsvm/
The idea would be to define your gesture as a n-dimensional training problem. Then simply train for each gesture (multiple classification SVM). Once trained you map any user gesture as N-dimensional vector and attempt to classify it with the trained model.
Related
I'm the developer on a game which uses gesture recognition with the HTC Vive roomscale VR headset, and I'm trying to improve the accuracy of our gesture recognition.
(The game, for context: http://store.steampowered.com//app/488760 . It's a game where you cast spells by drawing symbols in the air.)
Currently I'm using the 1 dollar algorithm for 2D gesture recognition, and using an orthographic camera tied to the player's horizontal rotation to flatten the gesture the player draws in space.
However, I'm sure there must be better approaches to the problem!
I have to represent the gestures in 2D in instructions, so ideally I'd like to:
Find the optimal vector on which to flatten the gesture.
Flatten it into 2D space.
Use the best gesture recognition algorithm to recognise what gesture it is.
It would be really good to get close to 100% accuracy under all circumstances. Currently, for example, the game tends to get confused when players try to draw a circle in the heat of battle, and it assumes they're drawing a Z shape instead.
All suggestions welcomed. Thanks in advance.
Believe me or not, but I found this post two months ago and decided to test my VR/AI skills by preparing a Unity package intended for recognising magic gestures in VR. Now, I'm back with a complete VR demo: https://ravingbots.itch.io/vr-magic-gestures-ai
The recognition system tracks a gesture vector and then projects it onto a 2D grid. You can also setup a 3D grid very easily if you want the system to work with 3D shapes, but don't forget to provide a proper training set capturing a large number of shape variations.
Of course, the package is universal, and you can use it for a non-magical application as well. The code is well documented. Online documentation rendered to a PDF has 1000+ pages: https://files.ravingbots.com/docs/vr-magic-gestures-ai/
The package was tested with HTC Vive. Support for Gear VR and other VR devices is progressively added.
Seems to me this plugin called Gesture Recognizer 3.0 could give you a great insight on what step you should take
Gesture Recognizer 3.0
Also, I found this javascript gesture recognition lib in github
Jester
Hope it helps.
Personally I recommend AirSig
It covers more features like authentication using controllers.
The Vive version and Oculus version are free.
"It would be really good to get close to 100% accuracy under all circumstances." My experience is its built-in gestures is over 90% accuracy, signature part is over 96%. Hope it fits your requirement.
These days I am trying to develop two algorithms in c#. Self Organizing Map, Particle Swarm Optimization and Glowworm Swarm optimization. I know how the algorithms work but there is an issue which I am not sure where to start from.
Agents in the search space which try to find the best solution have some coordinates(x and y). I don't know how am I should represent the position of agents visually in a form in each iteration. One option may be using charts in c# and represent point so that in each iteration I am going to change the position of the agent(point) in the chart. Another way may be using drawing classes in c# and drawing circles or points in a panel based on the x and y coordinates. Which classes of .net should I use to represents points in a search space visually(in a 2D space).
I hope you understood me and thank you for reading this post.
If your design variables are N-dimensional, N>3, it is not an easy job to visualize the entire domain of interest. You can either project the N-Dim to 2D or 3D to get a "section" of the fields.
I'm working on small WPF desktop app to track a robot. I have a Kinect for Windows on my desk and I was able to do the basic features and run the Depth camera stream and the RGB camera stream.
What I need is to track a robot on the floor but I have no idea where to start. I found out that I should use EMGU (OpenCV wrapper)
What I want to do is track a robot and find it's location using the depth camera. Basically, it's for localization of the robot using Stereo Triangulation. Then using TCP and Wifi to send the robot some commands to move him from one place to an other using both the RGB and Depth camera. The RGB camera will also be used to map the object in the area so that the robot can take the best path and avoid the objects.
The problem is that I have never worked with Computer Vision before and it's actually my first, I'm not stuck to a deadline and I'm more than willing to learn all the related stuff to finish this project.
I'm looking for details, explanation, hints, links or tutorials to achieve my need.
Thanks.
Robot localization is a very tricky problem and I myself have been struggling for months now, I can tell you what I have achieved But you have a number of options:
Optical Flow Based Odometery: (Also known as visual odometry):
Extract keypoints from one image or features (I used Shi-Tomashi, or cvGoodFeaturesToTrack)
Do the same for a consecutive image
Match these features (I used Lucas-Kanade)
Extract depth information from Kinect
Calculate transformation between two 3D point clouds.
What the above algorithm is doing is it is trying to estimate the camera motion between two frames, which will tell you the position of the robot.
Monte Carlo Localization: This is rather simpler, but you should also use wheel odometery with it.
Check this paper out for a c# based approach.
The method above uses probabalistic models to determine the robot's location.
The sad part is even though libraries exist in C++ to do what you need very easily, wrapping them for C# is a herculean task. If you however can code a wrapper, then 90% of your work is done, the key libraries to use are PCL and MRPT.
The last option (Which by far is the easiest, but the most inaccurate) is to use KinectFusion built in to the Kinect SDK 1.7. But my experiences with it for robot localization have been very bad.
You must read Slam for Dummies, it will make things about Monte Carlo Localization very clear.
The hard reality is, that this is very tricky and you will most probably end up doing it yourself. I hope you dive into this vast topic, and would learn awesome stuff.
For further information, or wrappers that I have written. Just comment below... :-)
Best
Not sure if is would help you or not...but I put together a Python module that might help.
http://letsmakerobots.com/node/38883#comments
I've been asked to make a simple card game in WPF designed to work with Microsoft Pixel Sense Tables (Version 1).
I've had experience in creating user applications in WPF and even for the Surface but I am quite new to building games. I've had a look at XNA and I am going through the documentation as we speak.
The card game has three different stages and I am looking at completing Stage 1 for now. Stage 1 involves creation of two sets of cards. One set of cards has a word and the other set of cards have a phrase which is related to the word in the other set. The cards are jumbled. The students then have to match the right pairs.
Now I know part of the job ha been simplified thanks to the ScatterView on the Surface SDK. I have also decided that the best way is to create a UserControl which is then added to a ScatterViewItem at runtime. The words and the associated phrases are stored in a mySQL database or an access database.
But this is where I am having trouble. When the students do pair up a set of right cards how do I attach both the cards together? I looked at the Scatter puzzle included in the SDK but it seems a but too complex for this game at this stage. I wanted to create the cards resembling two torn pieces of paper which when attached form a bigger piece of paper but again I am not quite sure how to achieve this.
I like to mess around with AI and wanted to try my hand at face recognition the first step is to find the faces in the photographs. How is this usually done? Do you use convolution of a sample image/images or statistics based methods? How do you find the bounding box for the face? My goal is to classify the pictures of my kids from all the digital photos.
Thanks in advance.
Have a look at http://www.face-rec.org/algorithms/ - you'll see that there are a variety of ways of going about this.
Assuming you're wanting to code the whole thing yourself, you'll need to look into Bayesian Frameworks, Neural Networks, possibly maths ones like Linear Discriminant Analysis (LDA) and the cool-named Elastic Bunch Graph Matching.
However it's worth noting that there are sooo many people who have coded this around the world, that there are now ready to use, open source, off the shelf apps, apis and libraries that you can simply call. Or neural networks you can plug in - for example - TiNA.
Do a good bit of reading - it's a fascinating topic, and then decide whether you want to go through reinventing the wheel (hey, it's fun to code, but it may not be what you want to focus on) or whether you'll inherit and extend some library or API.
Enjoy!
Try this:
OpenCV
This should help you out with face detection and object recognition projects
OpenCv for C# OpenCvSharp
Sample Code for Face detect
you can try ASM,AAM
http://code.google.com/p/aam-opencv/
or some faceAPI
http://www.seeingmachines.com/product/faceapi/
http://www.luxand.com/facesdk/
http://betaface.com/
I have an OpenCV Face Detection and Face Recognition (Haar Face Detection + Histogram Equalization + Eigenfaces) tutorial and free source code that you could try: http://www.shervinemami.info/faceRecognition.html