Continuing from this thread:
What are good algorithms for vehicle license plate detection?
I've developed my image manipulation techniques to emphasise the license plate as much as possible, and overall I'm happy with it, here are two samples.
Now comes the most difficult part, actually detecting the license plate. I know there are a few edge detection methods, but my maths is quite poor so I'm unable to translate some of the complex formulas into code.
My idea so far is to loop through every pixel within the image (for loop based on img width & height) From this compare each pixel against a list of colours, from this an algorithm is checked to see if the colors keep differentiating between the license plate white, and the black of the text. If this happens to be true these pixels are built into a new bitmap within memory, then an OCR scan is performed once this pattern has stopped being detected.
I'd appreciate some input on this as it might be a flawed idea, too slow or intensive.
Thanks
Your method of "see if the colors keep differentiating between the license plate white, and the black of the text" is basically searching for areas where the pixel intensity changes from black to white and vice-versa many times. Edge detection can accomplish essentially the same thing. However, implementing your own methods is still a good idea because you will learn a lot in the process. Heck, why not do both and compare the output of your method with that of some ready-made edge detection algorithm?
At some point you will want to have a binary image, say with black pixels corresponding to the "not-a-character" label, and white pixels corresponding to the "is-a-character" label. Perhaps the simplest way to do that is to use a thresholding function. But that will only work well if the characters have already been emphasized in some way.
As someone mentioned in your other thread, you can do that using the black hat operator, which results in something like this:
If you threshold the image above with, say, Otsu's method (which automatically determines a global threshold level), you get this:
There are several ways to clean that image. For instance, you can find the connected components and throw away those that are too small, too big, too wide or too tall to be a character:
Since the characters in your image are relatively large and fully connected this method works well.
Next, you could filter the remaining components based on the properties of the neighbors until you have the desired number of components (= number of characters). If you want to recognize the character, you could then calculate features for each character and input them to a classifier, which usually is built with supervised learning.
All the steps above are just one way to do it, of course.
By the way, I generated the images above using OpenCV + Python, which is a great combination for computer vision.
Colour, as much as looks good, will present quite some challenges with shading and light conditions. Depends really how much you want to make it robust but real world cases have to deal with such issues.
I have done research on road footage (see my profile page and look here for sample) and have found that the real-world road footage is extremely noisy in terms of light conditions and your colours can change from Brown to White for a yellow back-number-plate.
Most algorithms use line detection and try to find a box with an aspect ratio within an acceptable range.
I suggest you do a literature review on the subject but this was achieved back in 1993 (if I remember correctly) so there will be thousands of articles.
This is quite a scientific domain so just an algorithm will not solve it and you will needs numerous pre/post processing steps.
In brief, my suggestion is to use Hough transform to find lines and then try to look for rectangles that could create acceptable aspect ratio.
Harris feature detection could provide important edges but if the car is light-coloured this will not work.
If you have a lot of samples, you could try to check face detection method developed by Paul Viola and Michael Jones. It's good for face detection, maybe it'll do fine with license plate detection (especially if combined with some other method)
Related
I am interested in analyzing a scanned document, a form, and I want to be able to detect if someone has checked or filled in a check box in various places in the form (similar to perhaps a scantron), and maybe capture the image of a signature and such.
Since these check boxes will be at known locations it seems I might could ask for a few pixels at (x,y) and average them if its darker than N threshold then its checked. However, I imagine that scanning in could introduce a large shift in the actual position, relative to the edge of the image.
As it is clear I am a newbie in this area, does a framework exist (open source, or commercial) or any patterns or examples anyone could point me to, to start down this path. (Or might this be impossible to do in .net, and I should start looking into managed application?)
This is referred to as ICR (Intelligent Character Recognition).
It is an established field. ICR does edge detection as a skewed scan is common.
You can try and do it yourself but there is a lot to it.
Leadtools is not free and I don't work for them
But this is a good example of ICR as a tool (SDK)
LEADTOOLS ICR SDK
If you have the documents in paper another option is to take them to a commercial scan vendor.
They will have software designed for ICR.
They also have high end scanners meant to work with the ICR.
I'm not familiar with .NET image processing, but I know image processing in general. So I'll give you the theory, and references to OpenCV.
To accommodate for skewing of the image, look into Fourier transforms, and Hough Transforms and Hough Lines. What you'd basically want to do is to run the fourier transform, then turn the results into a BW image. Find the strongest lines for HoughLines, and then keep the longest of them. This line will be one of the axis lines, in my experimentation, it was usually the vertical axis. Find the angle of deviation from a straight vertical line, and then (depending on the particular rotation algorithm) rotate the image by the negative of this amount.
If the rotation algorithm fills in with 0's (or with a white that's too far off the color of the image) you can crop the image using angle found earlier to calculate the deviation (This is where all that trig you learned in school comes in handy).
Then find the bounding box that encloses the text on the page and crop down to that. When checking to see if a box is checked or not, you'll want to look in areas, probably about 5-10 pixels larger than than the size of the checkbox depending on resolution, to get checkbox ROI.
With this, you might want to see if x% of the ROI is written in to verify if the box was checked or not.
i have 4 shapes in image
i want to get pixels of one shape in list of point
the shapes have same color
List<point> GetAllPixelInShape(point x)
{
//imp
}
where x point of this shape
Long story short, you could begin with a connected components / region labeling algorithm.
http://en.wikipedia.org/wiki/Connected-component_labeling
In OpenCV you can call findContours() to identify contours, which are the borders of your connected regions.
http://dasl.mem.drexel.edu/~noahKuntz/openCVTut7.html
OCR is an extremely difficult task, especially for a script like Arabic. Creating an OCR algorithm from scratch takes a lot of work and numerous algorithms working together. OCR for machine printed text is hard enough. Implementing an algorithm to read handwriting is not something I'd suggest trying until you have a year or two of image processing experience. If you haven't read textbooks and academic papers on OCR, you're likely to spend a lot of time reproducing work that has already been done.
If you're not familiar with contour tracing and/or blob analysis, then working with OpenCV may not be a good first step. Since you have a specific goal in mind, you might first try different algorithms in a user-friendly GUI that will save you coding time.
Consider downloading ImageJ so that you can see how the algorithms work. There are plugins for a variety of common image processing algorithms.
http://rsbweb.nih.gov/ij/
Your proposed method signature doesn't provide enough information to solve this. Your method will need to know the bounds of your shape, how long and wide it is etc, ideally a set of points that indicate those bounds.
Once you have those, you could potentially apply the details of this article, in particular the algorithms specified in the answer to solve your problem.
I am currently using EmguCV (OpenCV C# wrapper) sucessfully to detect faces in real-time (webcam). I get around 7 FPS.
Now I'm looking to improve the performances (and save CPU cycles), and I'm looking for options, here are my ideas:
Detect the face, pick up features of the face and try to find those features in the next frames (using SURF algorithm), so this becomes a "face detection + tracking". If not found, use face detection again.
Detect the face, in the next frame, try to detect the face in a ROI where the face previously was (i.e. look for the face in a smaller part of the image). If the face is not found, try looking for it in the whole image again.
Side idea: if no face detected for 2-3 frames, and no movement in the image, don't try to detect anymore faces until movement is detected.
Do you have any suggestions for me ?
Thanks.
All the solutions you introduced seem to be smart and reasonable. However, if you use Haar for face detection you might try to create a cascade with less stages. Although 20 stages are recommended for face detection, 10-15 might be enough. That would noticeably improve performance. Information on creating own cascades can be found at Tutorial: OpenCV haartraining (Rapid Object Detection With A Cascade of Boosted Classifiers Based on Haar-like Features).
Again, using SURF is a good idea. You can also try P-N learning: Bootstrapping binary classifiers by structural constraints. There are interesting videos on YouTube presenting this method, try to find them.
For the SURF algorithm, you could try, but i am not sure that it provides relevant features on a face, maybe around the eyes, or if you are close and have skin irregularities, or again maybe in the hair if the resolution is enough. Moreover, SURF is not really really fast, and i would just avoiding doing more calculous if you want to save CPU time.
The roi is a good idea, you would choose it by doing a camshift algorithm, it won't save a lot of CPU, but you could try as camshift is a very lightweight algorithm. Again i am not sure it will be really relevant, but you got the good idea in your second post : minimize the zone where to search...
The side idea seems quite good to me, you could try to detect motion (global motion for instance), if there's not so much, then don't try to detect again what you already detected ... You could try doing that with motion templates as you know the silouhette from meanshift or face detection...
A very simple, lightweight but un-robust template matching with the frame n-1 and frame n could give you aswell a coefficient that measures a sort of similarity between these two frames, you can say that below a certain threshold you activate face detection.... why not ? It should take 5min to implement if the C# wrapper has the matchTemplate() equivalent function...
I'll come back here if i have better (deeper) ideas, but for now, i've just come back from work and it's hard to think more...
Julien,
This is not a perfect answer, but just a suggestion.
In my digital image processing classes in my last semester of B.Tech in CS, i learned about bit place slicing, and how the image with just its MSB plane information gives almost 70% of the useful image information. So, you'll be working with almost the original image but with just one-eighth the size of the original.
So although i haven't implemented it in my own project, i was wondering about it, to speed up face detection. Because later on, eye detection, pupil and eye corner detection also take up a lot of computation time and make the whole program slow.
I want to count number of people crossing a line from either side. I have a camera that is placed on ceiling and shooting for the floor where the line is (So camera sees just top of people heads; and so it is more of object detection than people detection).
Is there any sample solution for this problem or similar problems like this? So I can learn from them?
Edit 1: More than one person is crossing the line at any moment.
If nothing else but humans are subject to cross the line then you need not to detect people you only have to detect motion.
There are several approaches for motoin detection.
Probably the simplest one fits your goals. You simply calculate difference between successive frames of video stream and this way determine "motion mask" and thus detect line crossing event
As an improvement of this "algorithm" you may consider "running average" method.
To determine a direction of motion you can use "motion templates".
In order to increase accuracy of your detector you may try any background subtraction technique (which in turn is not a simple solution). For example, if there is some moving background which should be filtered out (e.g. using statistical learning)
All algorithms mentioned are included in OpenCV library.
UPD:
how to compute motion mask
Useful functions for determining motion direction cvCalcMotionGradient, cvSegmentMotion, cvUpdateMotionHistory (search docs). OpenCV library contains example code for motion analysis, see motempl.c
advanced background subtraction from "Learning OpenCV" book
I'm not an expert in video-based cv, but if you can reduce the problem into a finite set of images (for instance, entering frame, standing on line, exiting frame), then you can use one of many shape recognition algorithms. I know of Shape Context which is good, but I doubt if it subtle enough for this application (it won't tell the difference between a head and most other round objects).
Basically, try to extract key images from the video, and then test them with shape recognition algorithms.
P.S. Finding the key images might be possible with good motion detection methods.
We have a for fun project which require us to compare two black and white bitmaps of two signature and say whether they are the same persons signature. As this is just two loaded bitmaps rather than data captured from a tablet the approach is going to be a little different to normal signature recognition.
I am thinking it would require the following steps
Crop the bitmaps to just the signature
Try to work out some kind of rotation to align them
Resize to make the cropped / rotated bitmaps the same
Analyse the signature inside (maybe by breaking down into a grid)
Does anyone have any thoughts on this project? How to best do the rotation, comparison etc? Seen anything similar?
You may want to look at:SOMs for interesting pics (:D) as well as an example of how to compare image similarities.
There are two main types of Neural Networks - supervise and unsupervised. SOMs are unsupervised. Depending on your situation, you might want to take a look at supervised Neural Networks NNs are common, and quite straightforward to implement for the most part.