So, recently I managed to land myself a Kinect v2 project (hurray!) on my Industrial Placement which is supposed to detect if a person is wearing the correct PPE or (personal protective equipment to you and me).
Included in this scope is detecting if the person is wearing the correct:
Hazard Protection Suit
Boots
Gloves
Hat
A beard mesh! (Only if they have a beard)
I have only recently begun working with the Kinect v2 Sensor so you will have to forgive any ignorance on my part. I am new to this whole game but have worked through a fair few examples online. I guess I am just looking for advice/sources on how best to solve the problem.
Sources online seem scarce for trying to detect if a person is WEARING something. There are a couple of parts to this project:
Split up human into components (hands, feet, head, body) while retaining colour. This part I imagine would be best done by the Kinect SDK.
Give a percentage likelihood that the person's hand part is wearing gloves, head part is wearing hat etc... I've seen some ideas for this including graphics packages such as OpenCV.
I'm looking for advice concerning both parts of this project. Feel free to ignore the waffling below, but I thought it best to post some of my own ideas first.
Idea (and worked example) 1 - Point clouds
I have done some preliminary projects involving basic detection of humans. In fact, grabbing the VS project from this source: http://laht.info/record-3d-video-with-kinect-v2-and-play-it-back-in-the-browser/ I have managed to create a .ply file out of human beings. The problem here would be trying to split the .ply 3D image up to identify the suit, hat, gloves, boots (and I guess) beard. I imagine this is a fairly complex task.
Of the 3 ideas however, I'm coming back around to this one. Splitting up a 3D .ply image and trying to detect the PPE could turn out to be more accurate. Would need advice here.
Idea 2 - Blob Detection
Using this: http://channel9.msdn.com/coding4fun/kinect/Kinect--OpenCV--WPF--Blob-Tracking I pondered whether there would be a good way of splitting up a human into a colour picture of their "hand part", "head part", "body part" etc with the Kinect v2 SDK. Maybe I could then use a graphics package like OpenCV to test if the colours are similar or if the logo on the suit can be detected. Unfortunately, there is no logo currently on the gloves and boots so this might not turn out to give a very reliable result.
Idea 3 - Change the PPE
This isn't ideal but if I can't do this another way, I could perhaps insist that logos would need to be put on the gloves and suit. In this worst case scenario, I guess I would be trying to just detect some text in 3D space.
Summary
I'd be grateful for any advice on how to begin tackling this problem. Even initial thoughts might spark something :)
Peace!
Benjamin Biggs
Related
I'm a college student who has learned to program. I have a game that is just wrapping up, and my graphics and design team is (Freshman) spotty. I'm planning for the worst, and would like recommendations for animation and design software that a programmer can easily pick up and use with no longer than 8-10 hours to learn it. If you could post a couple down below and a brief description of what it can do, that would be great. I will post my specifications below.
Working on Unity
Broke... Like College students are. I can put down a little money though
I will be creating a lot of sprite 2D images
I will be making somewhere around 2-5 animations depending on how well I pick up on it.
I do have Blender, but I cannot figure out how to color things on it. I barely figured out how to design an explosion.
Using of graphical tool can't make your graphical design and art better. You have to be an artist or learn to. So you can go with 2 ways:
1) Try to be an artist (it's hard long way full of risks). You need much time for practicing with ANY tools (pen and paper are good for the start). When you be able to create amazing things you find tools very quickly.
2) Focus on programming or game-design and use simplified design styles (baby-painting, hand-writing with pen, flat-design) or even use technical sprites like simple geometry. When your game will be almost finished you can hire an artist to redraw your art. It is much easier when your have all necessary image lists and even images with right ration and resolution. Just redraw them and put in your game and game is ready for production.
What I'm trying to say? Don't try to compete with your average graphics. Today's market is full of hi-quality games with amazing graphics. Impress gamers with your cool ideas and gameplay and may be they forgive you bad graphics. Anyway you have many options:
Make your game with technical images first and then will see what to do next (best option)
Use free images under creative commons license (it's difficult to gather full set of images in unique style)
Use primitive graphics that gamers forgive you (your game must be very cool)
Hire a professional artist (best option if you have money and finished game)
Regarding animations - it's pretty easy to avoid drawing each frame. You can transform your objects in Unity so that your scene will look vivid with standard Unity features
Sorry that I don't suggest you concrete tool, but I want to save your time and energy to encourage you making cool games, not fighting with tools that are useless for non-artist.
We are developing applications on a touch table surface SUR40 with XNA and WPF, but vendor's provided byte-tags which are printed have not worked well on the surface.
The detail is Tags are blinking when moving or rotating around the table top (touch surface). And more, at some angles Tag recognition gets flickering even at rest state. I have traversed several forums but not so many clear answers for this issue.
I raise some questions here in case someone who experienced the case may give out good advises:
What's (are) the most important feature(s) for tag recognition: flatness of tags, tag size, tag material, background scene, ... or else.
How to serialize the producing of tags.
We have tried in quite many ways (using soft thick double-side tape, hard rubber plate, acrylic pieces) but all can get fuzzy manual solutions, in which we can not differentiate why this piece is good but those are not.
Appreciate your all helps as I know this issue is quite secluded in programming area.
P/S: byte tags are here:
http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=11029
Take a look at this guide for printing http://msdn.microsoft.com/en-us/library/ee804885(v=surface.10).aspx
And this thread for troubleshooting http://social.msdn.microsoft.com/Forums/en-US/surfaceappdevelopment/thread/78a9e60d-9f99-400f-9ba9-273843414314/
Tag tracking is known to have degraded between Surface v1 and Surface v2 but those 2 links have tips that can help you hopefully get some better performance.
As for serializing production, I'm not really sure what you mean. There are only 255 possible values so it doesn't take more than 4 sheets of paper to print a complete set which you can then cut up and use however you need.
I'm working on small WPF desktop app to track a robot. I have a Kinect for Windows on my desk and I was able to do the basic features and run the Depth camera stream and the RGB camera stream.
What I need is to track a robot on the floor but I have no idea where to start. I found out that I should use EMGU (OpenCV wrapper)
What I want to do is track a robot and find it's location using the depth camera. Basically, it's for localization of the robot using Stereo Triangulation. Then using TCP and Wifi to send the robot some commands to move him from one place to an other using both the RGB and Depth camera. The RGB camera will also be used to map the object in the area so that the robot can take the best path and avoid the objects.
The problem is that I have never worked with Computer Vision before and it's actually my first, I'm not stuck to a deadline and I'm more than willing to learn all the related stuff to finish this project.
I'm looking for details, explanation, hints, links or tutorials to achieve my need.
Thanks.
Robot localization is a very tricky problem and I myself have been struggling for months now, I can tell you what I have achieved But you have a number of options:
Optical Flow Based Odometery: (Also known as visual odometry):
Extract keypoints from one image or features (I used Shi-Tomashi, or cvGoodFeaturesToTrack)
Do the same for a consecutive image
Match these features (I used Lucas-Kanade)
Extract depth information from Kinect
Calculate transformation between two 3D point clouds.
What the above algorithm is doing is it is trying to estimate the camera motion between two frames, which will tell you the position of the robot.
Monte Carlo Localization: This is rather simpler, but you should also use wheel odometery with it.
Check this paper out for a c# based approach.
The method above uses probabalistic models to determine the robot's location.
The sad part is even though libraries exist in C++ to do what you need very easily, wrapping them for C# is a herculean task. If you however can code a wrapper, then 90% of your work is done, the key libraries to use are PCL and MRPT.
The last option (Which by far is the easiest, but the most inaccurate) is to use KinectFusion built in to the Kinect SDK 1.7. But my experiences with it for robot localization have been very bad.
You must read Slam for Dummies, it will make things about Monte Carlo Localization very clear.
The hard reality is, that this is very tricky and you will most probably end up doing it yourself. I hope you dive into this vast topic, and would learn awesome stuff.
For further information, or wrappers that I have written. Just comment below... :-)
Best
Not sure if is would help you or not...but I put together a Python module that might help.
http://letsmakerobots.com/node/38883#comments
I am currently using EmguCV (OpenCV C# wrapper) sucessfully to detect faces in real-time (webcam). I get around 7 FPS.
Now I'm looking to improve the performances (and save CPU cycles), and I'm looking for options, here are my ideas:
Detect the face, pick up features of the face and try to find those features in the next frames (using SURF algorithm), so this becomes a "face detection + tracking". If not found, use face detection again.
Detect the face, in the next frame, try to detect the face in a ROI where the face previously was (i.e. look for the face in a smaller part of the image). If the face is not found, try looking for it in the whole image again.
Side idea: if no face detected for 2-3 frames, and no movement in the image, don't try to detect anymore faces until movement is detected.
Do you have any suggestions for me ?
Thanks.
All the solutions you introduced seem to be smart and reasonable. However, if you use Haar for face detection you might try to create a cascade with less stages. Although 20 stages are recommended for face detection, 10-15 might be enough. That would noticeably improve performance. Information on creating own cascades can be found at Tutorial: OpenCV haartraining (Rapid Object Detection With A Cascade of Boosted Classifiers Based on Haar-like Features).
Again, using SURF is a good idea. You can also try P-N learning: Bootstrapping binary classifiers by structural constraints. There are interesting videos on YouTube presenting this method, try to find them.
For the SURF algorithm, you could try, but i am not sure that it provides relevant features on a face, maybe around the eyes, or if you are close and have skin irregularities, or again maybe in the hair if the resolution is enough. Moreover, SURF is not really really fast, and i would just avoiding doing more calculous if you want to save CPU time.
The roi is a good idea, you would choose it by doing a camshift algorithm, it won't save a lot of CPU, but you could try as camshift is a very lightweight algorithm. Again i am not sure it will be really relevant, but you got the good idea in your second post : minimize the zone where to search...
The side idea seems quite good to me, you could try to detect motion (global motion for instance), if there's not so much, then don't try to detect again what you already detected ... You could try doing that with motion templates as you know the silouhette from meanshift or face detection...
A very simple, lightweight but un-robust template matching with the frame n-1 and frame n could give you aswell a coefficient that measures a sort of similarity between these two frames, you can say that below a certain threshold you activate face detection.... why not ? It should take 5min to implement if the C# wrapper has the matchTemplate() equivalent function...
I'll come back here if i have better (deeper) ideas, but for now, i've just come back from work and it's hard to think more...
Julien,
This is not a perfect answer, but just a suggestion.
In my digital image processing classes in my last semester of B.Tech in CS, i learned about bit place slicing, and how the image with just its MSB plane information gives almost 70% of the useful image information. So, you'll be working with almost the original image but with just one-eighth the size of the original.
So although i haven't implemented it in my own project, i was wondering about it, to speed up face detection. Because later on, eye detection, pupil and eye corner detection also take up a lot of computation time and make the whole program slow.
I like to mess around with AI and wanted to try my hand at face recognition the first step is to find the faces in the photographs. How is this usually done? Do you use convolution of a sample image/images or statistics based methods? How do you find the bounding box for the face? My goal is to classify the pictures of my kids from all the digital photos.
Thanks in advance.
Have a look at http://www.face-rec.org/algorithms/ - you'll see that there are a variety of ways of going about this.
Assuming you're wanting to code the whole thing yourself, you'll need to look into Bayesian Frameworks, Neural Networks, possibly maths ones like Linear Discriminant Analysis (LDA) and the cool-named Elastic Bunch Graph Matching.
However it's worth noting that there are sooo many people who have coded this around the world, that there are now ready to use, open source, off the shelf apps, apis and libraries that you can simply call. Or neural networks you can plug in - for example - TiNA.
Do a good bit of reading - it's a fascinating topic, and then decide whether you want to go through reinventing the wheel (hey, it's fun to code, but it may not be what you want to focus on) or whether you'll inherit and extend some library or API.
Enjoy!
Try this:
OpenCV
This should help you out with face detection and object recognition projects
OpenCv for C# OpenCvSharp
Sample Code for Face detect
you can try ASM,AAM
http://code.google.com/p/aam-opencv/
or some faceAPI
http://www.seeingmachines.com/product/faceapi/
http://www.luxand.com/facesdk/
http://betaface.com/
I have an OpenCV Face Detection and Face Recognition (Haar Face Detection + Histogram Equalization + Eigenfaces) tutorial and free source code that you could try: http://www.shervinemami.info/faceRecognition.html