I am currently using EmguCV (OpenCV C# wrapper) sucessfully to detect faces in real-time (webcam). I get around 7 FPS.
Now I'm looking to improve the performances (and save CPU cycles), and I'm looking for options, here are my ideas:
Detect the face, pick up features of the face and try to find those features in the next frames (using SURF algorithm), so this becomes a "face detection + tracking". If not found, use face detection again.
Detect the face, in the next frame, try to detect the face in a ROI where the face previously was (i.e. look for the face in a smaller part of the image). If the face is not found, try looking for it in the whole image again.
Side idea: if no face detected for 2-3 frames, and no movement in the image, don't try to detect anymore faces until movement is detected.
Do you have any suggestions for me ?
Thanks.
All the solutions you introduced seem to be smart and reasonable. However, if you use Haar for face detection you might try to create a cascade with less stages. Although 20 stages are recommended for face detection, 10-15 might be enough. That would noticeably improve performance. Information on creating own cascades can be found at Tutorial: OpenCV haartraining (Rapid Object Detection With A Cascade of Boosted Classifiers Based on Haar-like Features).
Again, using SURF is a good idea. You can also try P-N learning: Bootstrapping binary classifiers by structural constraints. There are interesting videos on YouTube presenting this method, try to find them.
For the SURF algorithm, you could try, but i am not sure that it provides relevant features on a face, maybe around the eyes, or if you are close and have skin irregularities, or again maybe in the hair if the resolution is enough. Moreover, SURF is not really really fast, and i would just avoiding doing more calculous if you want to save CPU time.
The roi is a good idea, you would choose it by doing a camshift algorithm, it won't save a lot of CPU, but you could try as camshift is a very lightweight algorithm. Again i am not sure it will be really relevant, but you got the good idea in your second post : minimize the zone where to search...
The side idea seems quite good to me, you could try to detect motion (global motion for instance), if there's not so much, then don't try to detect again what you already detected ... You could try doing that with motion templates as you know the silouhette from meanshift or face detection...
A very simple, lightweight but un-robust template matching with the frame n-1 and frame n could give you aswell a coefficient that measures a sort of similarity between these two frames, you can say that below a certain threshold you activate face detection.... why not ? It should take 5min to implement if the C# wrapper has the matchTemplate() equivalent function...
I'll come back here if i have better (deeper) ideas, but for now, i've just come back from work and it's hard to think more...
Julien,
This is not a perfect answer, but just a suggestion.
In my digital image processing classes in my last semester of B.Tech in CS, i learned about bit place slicing, and how the image with just its MSB plane information gives almost 70% of the useful image information. So, you'll be working with almost the original image but with just one-eighth the size of the original.
So although i haven't implemented it in my own project, i was wondering about it, to speed up face detection. Because later on, eye detection, pupil and eye corner detection also take up a lot of computation time and make the whole program slow.
Related
I have been experimenting with depth sensors using IR but they have been sadly lacking in accuracy (at least for what I want) and range.
So, looking for alternatives.
The application is I am holding a rectangular block of plastic and I am sweeping it from left to right across a surface ( a wall in this case ) and I want to know if the gap between plastic surface and wall surface reaches a certain threshold.
I have a web camera attached to a Microsoft Surface and I am aiming this camera at this user motion.
If I had better quality image I am sure I could use basic geometry to work this out. I looking around for better cameras as I type...
I was considering a radar sensor instead.
I spent many hours yesterday looking for something like a usb radar sensor with a SDK that is friendly for usage with C#.
I have not found anything.
That is not to say I have given up. I am continuing to look.
I just thought I would post here as well for any ones ideas on this?
i will continue to update my questions with hopefully a solution if no one else does.
thanks
i have 4 shapes in image
i want to get pixels of one shape in list of point
the shapes have same color
List<point> GetAllPixelInShape(point x)
{
//imp
}
where x point of this shape
Long story short, you could begin with a connected components / region labeling algorithm.
http://en.wikipedia.org/wiki/Connected-component_labeling
In OpenCV you can call findContours() to identify contours, which are the borders of your connected regions.
http://dasl.mem.drexel.edu/~noahKuntz/openCVTut7.html
OCR is an extremely difficult task, especially for a script like Arabic. Creating an OCR algorithm from scratch takes a lot of work and numerous algorithms working together. OCR for machine printed text is hard enough. Implementing an algorithm to read handwriting is not something I'd suggest trying until you have a year or two of image processing experience. If you haven't read textbooks and academic papers on OCR, you're likely to spend a lot of time reproducing work that has already been done.
If you're not familiar with contour tracing and/or blob analysis, then working with OpenCV may not be a good first step. Since you have a specific goal in mind, you might first try different algorithms in a user-friendly GUI that will save you coding time.
Consider downloading ImageJ so that you can see how the algorithms work. There are plugins for a variety of common image processing algorithms.
http://rsbweb.nih.gov/ij/
Your proposed method signature doesn't provide enough information to solve this. Your method will need to know the bounds of your shape, how long and wide it is etc, ideally a set of points that indicate those bounds.
Once you have those, you could potentially apply the details of this article, in particular the algorithms specified in the answer to solve your problem.
I want to count number of people crossing a line from either side. I have a camera that is placed on ceiling and shooting for the floor where the line is (So camera sees just top of people heads; and so it is more of object detection than people detection).
Is there any sample solution for this problem or similar problems like this? So I can learn from them?
Edit 1: More than one person is crossing the line at any moment.
If nothing else but humans are subject to cross the line then you need not to detect people you only have to detect motion.
There are several approaches for motoin detection.
Probably the simplest one fits your goals. You simply calculate difference between successive frames of video stream and this way determine "motion mask" and thus detect line crossing event
As an improvement of this "algorithm" you may consider "running average" method.
To determine a direction of motion you can use "motion templates".
In order to increase accuracy of your detector you may try any background subtraction technique (which in turn is not a simple solution). For example, if there is some moving background which should be filtered out (e.g. using statistical learning)
All algorithms mentioned are included in OpenCV library.
UPD:
how to compute motion mask
Useful functions for determining motion direction cvCalcMotionGradient, cvSegmentMotion, cvUpdateMotionHistory (search docs). OpenCV library contains example code for motion analysis, see motempl.c
advanced background subtraction from "Learning OpenCV" book
I'm not an expert in video-based cv, but if you can reduce the problem into a finite set of images (for instance, entering frame, standing on line, exiting frame), then you can use one of many shape recognition algorithms. I know of Shape Context which is good, but I doubt if it subtle enough for this application (it won't tell the difference between a head and most other round objects).
Basically, try to extract key images from the video, and then test them with shape recognition algorithms.
P.S. Finding the key images might be possible with good motion detection methods.
Continuing from this thread:
What are good algorithms for vehicle license plate detection?
I've developed my image manipulation techniques to emphasise the license plate as much as possible, and overall I'm happy with it, here are two samples.
Now comes the most difficult part, actually detecting the license plate. I know there are a few edge detection methods, but my maths is quite poor so I'm unable to translate some of the complex formulas into code.
My idea so far is to loop through every pixel within the image (for loop based on img width & height) From this compare each pixel against a list of colours, from this an algorithm is checked to see if the colors keep differentiating between the license plate white, and the black of the text. If this happens to be true these pixels are built into a new bitmap within memory, then an OCR scan is performed once this pattern has stopped being detected.
I'd appreciate some input on this as it might be a flawed idea, too slow or intensive.
Thanks
Your method of "see if the colors keep differentiating between the license plate white, and the black of the text" is basically searching for areas where the pixel intensity changes from black to white and vice-versa many times. Edge detection can accomplish essentially the same thing. However, implementing your own methods is still a good idea because you will learn a lot in the process. Heck, why not do both and compare the output of your method with that of some ready-made edge detection algorithm?
At some point you will want to have a binary image, say with black pixels corresponding to the "not-a-character" label, and white pixels corresponding to the "is-a-character" label. Perhaps the simplest way to do that is to use a thresholding function. But that will only work well if the characters have already been emphasized in some way.
As someone mentioned in your other thread, you can do that using the black hat operator, which results in something like this:
If you threshold the image above with, say, Otsu's method (which automatically determines a global threshold level), you get this:
There are several ways to clean that image. For instance, you can find the connected components and throw away those that are too small, too big, too wide or too tall to be a character:
Since the characters in your image are relatively large and fully connected this method works well.
Next, you could filter the remaining components based on the properties of the neighbors until you have the desired number of components (= number of characters). If you want to recognize the character, you could then calculate features for each character and input them to a classifier, which usually is built with supervised learning.
All the steps above are just one way to do it, of course.
By the way, I generated the images above using OpenCV + Python, which is a great combination for computer vision.
Colour, as much as looks good, will present quite some challenges with shading and light conditions. Depends really how much you want to make it robust but real world cases have to deal with such issues.
I have done research on road footage (see my profile page and look here for sample) and have found that the real-world road footage is extremely noisy in terms of light conditions and your colours can change from Brown to White for a yellow back-number-plate.
Most algorithms use line detection and try to find a box with an aspect ratio within an acceptable range.
I suggest you do a literature review on the subject but this was achieved back in 1993 (if I remember correctly) so there will be thousands of articles.
This is quite a scientific domain so just an algorithm will not solve it and you will needs numerous pre/post processing steps.
In brief, my suggestion is to use Hough transform to find lines and then try to look for rectangles that could create acceptable aspect ratio.
Harris feature detection could provide important edges but if the car is light-coloured this will not work.
If you have a lot of samples, you could try to check face detection method developed by Paul Viola and Michael Jones. It's good for face detection, maybe it'll do fine with license plate detection (especially if combined with some other method)
I recently joined a project where I need to get some vehicle based computer vision system. So what sort of special functionalities does a camera need, to be able to capture images while traveling at varying speeds ? for example how high a frame rate is required, and the exposure duration, shutter speed? Do you think that webcams(even if high end) will be able to achieve it ? The project requires the camera to be programmable in C# ...
Thank you very much in advance!
Unless video is capable of producing high quality low blur images, I would go with a camera with really fast shutterspeed, very short exposure duration, and for frame rate, following Seth's math, 44 centimeters is roughly a little more than a foot, which should be decent for calculations.
Reaction time for a human to respond to someone hitting the breaks in front of them is 1.5 seconds. If you can determine they hit their break light within 1/30th of a second, and it takes you 1 second to calculate and apply breaks, you already beat a human in reaction time.
How fast your shutter speed needs to be, is based on how fast you're vehicle is moving. Shutter speed reduces motion blur for a more accurate picture to analyze.
Try different speeds (if you can get a camera with this value configurable, might help).
I'm not sure that's an answerable question. It sounds like the sort of thing that the Darpa Grand Challenge hopes to determine :)
With regard to frame rate: If you're vehicle is going 30 miles per hour, a 30 FPS web cam will capture one frame for every 44 centimeters the vehicle travels. Whether or not that's "enough" depends on what you're planning to do with the image.
Not sure about the out-of-the-box C# programability, but a specific web-cam style camera to consider would be the PS3 eye.
It was specially engineered for motion-capture and (as I understand it) is capable of higher-quality images a high framerates than the majority of the competition. Windows drivers are available for it, and that opens the door for creating a C# wrapper.
Here is the product page, note the 120fps upper-end spec (not sure that the Windows drivers run at this rate, but obviously the hardware is capable of it).
One Note on shutter speed... images taken at a high framerate in low-light will likely be underexposed and unusable. If you'll need this to work in varying light conditions then the framerate will likely either need to be fixed at the low-end of your acceptable range, or will need to self-adjust based on available light.
These guys: Mobileye - develop such commercial systems for lane departure warnings and vehicle and pedestrian detection.
If you go to the "Manufacturer Products->Development and Evaluation Platforms->Cameras"
You can see what they use as cameras and also for their processing platforms.
30 fps should be sufficient for the applications mentioned above.
If money isn't an issue, take a look at cameras from companies like Opeton and others. You can control every aspect of every image capture including: capture time, image size, ++.
My iPhone can take pictures out the side of a car that are fairly blur free... past 10-20 feet. Inside of that, things are simply moving too fast; the shutter speed would need to be higher to not blur that.
Start with a middle-of-the-road webcamera, and move up as necessary? A laptop and a ride in your car while capturing still images would probably give you an idea of how well it works.