I recently joined a project where I need to get some vehicle based computer vision system. So what sort of special functionalities does a camera need, to be able to capture images while traveling at varying speeds ? for example how high a frame rate is required, and the exposure duration, shutter speed? Do you think that webcams(even if high end) will be able to achieve it ? The project requires the camera to be programmable in C# ...
Thank you very much in advance!
Unless video is capable of producing high quality low blur images, I would go with a camera with really fast shutterspeed, very short exposure duration, and for frame rate, following Seth's math, 44 centimeters is roughly a little more than a foot, which should be decent for calculations.
Reaction time for a human to respond to someone hitting the breaks in front of them is 1.5 seconds. If you can determine they hit their break light within 1/30th of a second, and it takes you 1 second to calculate and apply breaks, you already beat a human in reaction time.
How fast your shutter speed needs to be, is based on how fast you're vehicle is moving. Shutter speed reduces motion blur for a more accurate picture to analyze.
Try different speeds (if you can get a camera with this value configurable, might help).
I'm not sure that's an answerable question. It sounds like the sort of thing that the Darpa Grand Challenge hopes to determine :)
With regard to frame rate: If you're vehicle is going 30 miles per hour, a 30 FPS web cam will capture one frame for every 44 centimeters the vehicle travels. Whether or not that's "enough" depends on what you're planning to do with the image.
Not sure about the out-of-the-box C# programability, but a specific web-cam style camera to consider would be the PS3 eye.
It was specially engineered for motion-capture and (as I understand it) is capable of higher-quality images a high framerates than the majority of the competition. Windows drivers are available for it, and that opens the door for creating a C# wrapper.
Here is the product page, note the 120fps upper-end spec (not sure that the Windows drivers run at this rate, but obviously the hardware is capable of it).
One Note on shutter speed... images taken at a high framerate in low-light will likely be underexposed and unusable. If you'll need this to work in varying light conditions then the framerate will likely either need to be fixed at the low-end of your acceptable range, or will need to self-adjust based on available light.
These guys: Mobileye - develop such commercial systems for lane departure warnings and vehicle and pedestrian detection.
If you go to the "Manufacturer Products->Development and Evaluation Platforms->Cameras"
You can see what they use as cameras and also for their processing platforms.
30 fps should be sufficient for the applications mentioned above.
If money isn't an issue, take a look at cameras from companies like Opeton and others. You can control every aspect of every image capture including: capture time, image size, ++.
My iPhone can take pictures out the side of a car that are fairly blur free... past 10-20 feet. Inside of that, things are simply moving too fast; the shutter speed would need to be higher to not blur that.
Start with a middle-of-the-road webcamera, and move up as necessary? A laptop and a ride in your car while capturing still images would probably give you an idea of how well it works.
Related
I am working on removing the background and leaving only the bodies with Kinect V2 and c#/WPF in real time.
Removing the background works fine, but the edges of the bodies are very rough with Jaggies on the edges.
I need to smooth the edges in real-time (30 frames per second). I would appreciate any advice on that.
I am able to select the edges (similar to Photoshop's magic wand).
I tried to use something like Gaussian blur, but it seems to be too slow for a real-time application. Probably I am missing something because it seems to be a standard problem for many applications like games etc. Thank you!
You probably need to look into implementations of depth image enhancements or smoothing that fill holes around the edges of the silhouettes. For starters; maybe you can look into Kinect Depth Smoothing.This should work in real time since its just based on calculating modes. For more accurate implementation, there are research papers that address the same issue such as the ones below:
[a] Chen, L., Lin, H.., Li, S., “Depth image enhancement for Kinect using region growing and bilateral filter,” Pattern Recognition (ICPR), 2012 21st International Conference on, 3070–3073 (2012).
[b] Le, A. V., Jung, S.-W.., Won, C. S., “Directional joint bilateral filter for depth images,” Sensors 14(7), 11362–11378, Multidisciplinary Digital Publishing Institute (2014).
I'm not familiar with app development so I want to gain a better understanding on how animations are implemented in apps as I have different ideas.
I have designed 4 images using Adobe Illustrator, 3 of the images are planets and 1 image is a rocket. In my app I would like the rocket to randomly move to a planet when clicking a button, the ideas I have to implement this are:
Programatically move the rocket pixel by pixel towards the randomly selected planet
Or
Create 3 different animations of the rocket moving to each planet and when the random planet has been selected play the correct rocket animation.
Which one is the best approach for what I am trying to achieve? If you have better please can you share as I do not know how to go about doing this.
Thanks!
It's a trade off between download size / performance / versatility / maintainability / implementation time.
Will use more processor, result in a smaller program size, be more versatile (you could allow the user to control the rocket and detect when they arrive at a target planet)
Will use less processor, be simpler to implement, result in a larger program size. You'd need to remake the animations in the case of adding a new planet, doesn't scale well (What happens when you have 20 planets? with different screen resolutions?)
I'm making a War Game for iOS using Monotouch and C#. I'm running into some problem with the audio sound effects.
Here's what I require: The ability to play many sound effects simultaneously (possibly up to 10-20 at once) and the ability to adjust volume (for example, if the user zooms in on the battlefield the gun shot volume gets louder).
Here are my problems:
With AVAudioPlayer, I can adjust volume but I can only play 1 sound per thread. So if I want to play multiple sounds I have to have dozens and dozens of threads going just incase they overlap... This is a war game, picture 20 soldiers on the battlefield. Each soldier would have a "sound thread" to play gun fire sounds when they shoot because It is possible that every soldier could just happen to fire at the same exact time. I don't have a problem with making lots of threads, but my game already has dozens of threads running all the time and adding dozens more could get me into trouble... right? So I'd rather not go this road of adding dozens of more threads unless i have too...
With SystemSound, I can play as many sounds as I want in the same thread, but I can't adjust the volume.... So my work around here is, for every sound effect i have - save it like 4 times at 4 different volumes. That is a big pain... Any way to adjust volume with SystemSounds??
Both of these answer some of my requirements, but neither seems to be a seamless fit. Should I just go the AVAudioPlayer multi-threading nightmare road? Or the SystemSound multi-file-with-different-volume-levels nightmare road? Or is there a better way to do this?
Thanks in advance.
Finally found the solution to my problem. AVAudioPlayer IS capable of playiing multiple sounds at once but only with certain file formats... The details are available in this link. The reason why I couldn't play my sound effects simultaneously was because the file format was compressed and the iphone only has 1 hardware decompressor.
http://brainwashinc.wordpress.com/2009/08/14/iphone-playing-2-sounds-at-once/
I'm doing a kinect Application using Official Kinect SDK.
The Result I want
1) able to identify the body have been waving for 5sec. Do something if it does
2) able to identify leaning with one leg for 5sec. do something if it does.
Anyone knows how to do so? I'm doing in a WPF application.
Would like to have some example. I'm rather new to Kinect.
Thanks in advance for all your help!
The Kinect provides you with the skeletons it's tracking, you have to do the rest. Basically you need to create a definition for each gesture you want, and run that against the skeletons every time the SkeletonFrameReady event is fired. This isn't easy.
Defining Gestures
Defining the gestures can be surprisingly difficult. The simplest (easiest) gestures are ones that happen at a single point in time, and therefore don't rely on past locations of the limbs. For example, if you want to detect when the user has their hand raised above their head, this can be checked on every individual frame. More complicated gestures need to take a period of time into account. For your waving gesture, you won't be able to tell from a single frame whether a person is waving or just holding their hand up in front of them.
So now you need to be able to store relevant information from the past, but what information is relevant? Should you keep a store of the last 30 frames and run an algorithm against that? 30 frames only gets you a second's worth of information.. perhaps 60 frames? Or for your 5 seconds, 300 frames? Humans don't move that fast, so maybe you could use every fifth frame, which would bring your 5 seconds back down to 60 frames. A better idea would be to pick and choose the relevant information out of the frames. For a waving gesture the hand's current velocity, how long it's been moving, how far it's moved, etc. could all be useful information.
After you've figured out how to get and store all the information pertaining to your gesture, how do you turn those numbers into a definition? Waving could require a certain minimum speed, or a direction (left/right instead of up/down), or a duration. However, this duration isn't the 5 second duration you're interested in. This duration is the absolute minimum required to assume that the user is waving. As mentioned above, you can't determine a wave from one frame. You shouldn't determine a wave from 2, or 3, or 5, because that's just not enough time. If my hand twitches for a fraction of a second, would you consider that a wave? There's probably a sweet spot where most people would agree that a left to right motion constitutes a wave, but I certainly don't know it well enough to define it in an algorithm.
There's another problem with requiring a user to do a certain gesture for a period of time. Chances are, not every frame in that five seconds will appear to be a wave, regardless of how well you write the definition. Where as you can easily determine if someone held their hand over their head for five seconds (because it can be determined on a single frame basis), it's much harder to do that for complicated gestures. And while waving isn't that complicated, it still shows this problem. As your hand changes direction at either side of a wave, it stops moving for a fraction of a second. Are you still waving then? If you answered yes, wave more slowly so you pause a little more at either side. Would that pause still be considered a wave? Chances are, at some point in that five second gesture, the definition will fail to detect a wave. So now you need to take into account a leniency for the gesture duration.. if the waving gesture occurred for 95% of the last five seconds, is that good enough? 90%? 80%?
The point I'm trying to make here is there's no easy way to do gesture recognition. You have to think through the gesture and determine some kind of definition that will turn a bunch of joint positions (the skeleton data) into a gesture. You'll need to keep track of relevant data from past frames, but realize that the gesture definition likely won't be perfect.
Consider the Users
So now that I've said why the five second wave would be difficult to detect, allow me to at least give my thoughts on how to do it: don't. You shouldn't force users to repeat a motion based gesture for a set period of time (the five second wave). It is surprisingly tiring and just not what people expect/want from computers. Point and click is instantaneous; as soon as we click, we expect a response. No one wants to have to hold a click down for five seconds before they can open Minesweeper. Repeating a gesture over a period of time is okay if it's continually executing some action, like using a gesture to cycle through a list - the user will understand that they must continue doing the gesture to move farther through the list. This even makes the gesture easier to detect, because instead of needing information for the last 5 seconds, you just need enough information to know if the user is doing the gesture right now.
If you want the user to hold a gesture for a set amount of time, make it a stationary gesture (holding your hand at some position for x seconds is a lot easier than waving). It's also a very good idea to give some visual feedback, to say that the timer has started. If a user screws up the gesture (wrong hand, wrong place, etc) and ends up standing there for 5 or 10 seconds waiting for something to happen, they won't be happy, but that's not really part of this question.
Starting with Kinect Gestures
Start small.. really small. First, make sure you know your way around the SkeletonData class. There are 20 joints tracked on each skeleton, and they each have a TrackingState. This tracking state will show whether the Kinect can actually see the joint (Tracked), if it is figuring out the joint's position based on the rest of the skeleton (Inferred), or if it has entirely abandoned trying to find the joint (NotTracked). These states are important. You don't want to think the user is standing on one leg simply because the Kinect doesn't see the other leg and is reporting a bogus position for it. Each joint has a position, which is how you know where the user is standing.. piece by piece. Become familiar with the coordinate system.
After you know the basics of how the skeleton data is reported, try for some simple gestures. Print a message to the screen when the user raises a hand above their head. This only requires comparing each hand to the Head joint and seeing if either hand is higher than the head in the coordinate plane. After you get that working, move up to something more complicated. I'd suggest trying a swiping motion (hand in front of body, moves either right to left or left to right some minimum distance). This requires information from past frames, so you'll have to think through what information to store. If you can get that working, you could try string a series of swiping gestures in a small amount of time and interpreting that as a wave.
tl;dr: Gestures are hard. Start small, build your way up. Don't make users do repetitive motions for a single action, it's tiring and annoying. Include visual feedback for duration based gestures. Read the rest of this post.
The Kinect SDK helps you get the coordinates of different joints. A gesture is nothing but change in position of a set of joints over a period of time.
To recognize gestures, you've to store the coordinates for a period of time and iterate through it to see if it obeys the rules for a particular gesture (such as - the right hand always moves upwards).
For more details, check out my blog post on the topic:
http://tinyurl.com/89o7sf5
I am currently using EmguCV (OpenCV C# wrapper) sucessfully to detect faces in real-time (webcam). I get around 7 FPS.
Now I'm looking to improve the performances (and save CPU cycles), and I'm looking for options, here are my ideas:
Detect the face, pick up features of the face and try to find those features in the next frames (using SURF algorithm), so this becomes a "face detection + tracking". If not found, use face detection again.
Detect the face, in the next frame, try to detect the face in a ROI where the face previously was (i.e. look for the face in a smaller part of the image). If the face is not found, try looking for it in the whole image again.
Side idea: if no face detected for 2-3 frames, and no movement in the image, don't try to detect anymore faces until movement is detected.
Do you have any suggestions for me ?
Thanks.
All the solutions you introduced seem to be smart and reasonable. However, if you use Haar for face detection you might try to create a cascade with less stages. Although 20 stages are recommended for face detection, 10-15 might be enough. That would noticeably improve performance. Information on creating own cascades can be found at Tutorial: OpenCV haartraining (Rapid Object Detection With A Cascade of Boosted Classifiers Based on Haar-like Features).
Again, using SURF is a good idea. You can also try P-N learning: Bootstrapping binary classifiers by structural constraints. There are interesting videos on YouTube presenting this method, try to find them.
For the SURF algorithm, you could try, but i am not sure that it provides relevant features on a face, maybe around the eyes, or if you are close and have skin irregularities, or again maybe in the hair if the resolution is enough. Moreover, SURF is not really really fast, and i would just avoiding doing more calculous if you want to save CPU time.
The roi is a good idea, you would choose it by doing a camshift algorithm, it won't save a lot of CPU, but you could try as camshift is a very lightweight algorithm. Again i am not sure it will be really relevant, but you got the good idea in your second post : minimize the zone where to search...
The side idea seems quite good to me, you could try to detect motion (global motion for instance), if there's not so much, then don't try to detect again what you already detected ... You could try doing that with motion templates as you know the silouhette from meanshift or face detection...
A very simple, lightweight but un-robust template matching with the frame n-1 and frame n could give you aswell a coefficient that measures a sort of similarity between these two frames, you can say that below a certain threshold you activate face detection.... why not ? It should take 5min to implement if the C# wrapper has the matchTemplate() equivalent function...
I'll come back here if i have better (deeper) ideas, but for now, i've just come back from work and it's hard to think more...
Julien,
This is not a perfect answer, but just a suggestion.
In my digital image processing classes in my last semester of B.Tech in CS, i learned about bit place slicing, and how the image with just its MSB plane information gives almost 70% of the useful image information. So, you'll be working with almost the original image but with just one-eighth the size of the original.
So although i haven't implemented it in my own project, i was wondering about it, to speed up face detection. Because later on, eye detection, pupil and eye corner detection also take up a lot of computation time and make the whole program slow.