I was having trouble coming up with a way to describe the problem area that I want to understand better so I set up the following scenario to help illustrate
Given the following image, how would I go about programming something that could find all of the happy faces that match the image in position 1 (call it the template image) and disregard sad face images like those in position 2 and 5.
...
I'm not looking for anyone to solve it for me, I just need an insightful first step to get me started as it's uncharted territory for me.
What would this be called? What should I be querying google and stack overflow for in order to find helpful information? Does anyone have a library or code snippet that can help get me started?
Also, I'm a .NET / C# programmer by trade so anything that happens to be in my native language is especially appreciated but not a deal-breaker.
Thanks in advance...
Mike
The technique in fact depends on the actual scenario. This goes by several names, such as content based retrieval, template matching, image description and such.
My suggestions:
If your scenario is like the faces, rotated at known angles with known sizes, look for simpler techniques, such as the correlation of two images. Do it for each angle and you got it.
If you know that the only variation between images is the rotation, that means you have only the happy and sad faces rotated, without other distortions, you can look for rotation invariant matching methods. The Fourier theory may help you there, and also mappings to polar coordinates associated with correlations.
The worst case, where you have several variations, you will need to look into image descriptors and pattern matching techniques. These also depend on the image type, and there are several of them. If you end up with these, you'll have a scheme with some libraries/code to extract features from the images and a classifier to tell you which are the same and which are not, with some kind of confidence (such as a distance measure between the features vectors).
The simplest technique would probably be template matching. The difference in your example images is pretty small though, so it might be hard to differentiate for example image 1 and 5 in your example.
A possible algorithm is:
Compute gradient of the image
For each gradient vector, compute the gradient direction
Compute the orientation histogram (angle vs frequency) of the gradient vectors
This orientation histogram will be distinct for the "happy" vs the "sad" smiley.
have fun.
A simple poor persons algorithm just to get the job done in this case could be.
Determine the bounding box of the image and assume the centre of this is the circle.
Within the circle search for the two eyes as BLOB's. ie objects that contain 20 or pixels in total that fit within a small defined rectangle.
Once you have to location of the two eyes you can determine the slope of the intersecting line between the two lines and hence the orientation of the face.
The distance from the point in the middle of the two eyes straight down though the centre of the circle to the mouth return 1 of 2 possible distances. ie sad or happy.
Quick and dirty and hardcoded to this particular image but it would do the job quickly.
The AForge option is probably a better generalised approach.
Related
I Have two images( left and right )
I want to measure the real distance on image?
When I click on the image, ı ll get real distance to clicked point to camera.
Left Image:
Right Image:
I have calibrated the two images. I want to use EmguCV to get distance from image.
Is this possible ?
While I do not know the specifics of EmguCV, I can tell you the concept behind how stereo depth perception works, and hopefully you can then implement some sort of fix.
Essentially, the first step is to segment and match parts of the image. What you are trying to accomplish here is to identify the parts of the image that are the "same" in each. For instance, you want to be able to identify the center of the lamp in each image. The feature set you use to do this is up to you, but one basic one that may help is by using an edge detector (like the canny method) and trying to match contours with similar shapes. Additionally, another technique that is common is breaking up the image into smaller blocks and matching features in those blocks. The method you use is up to you.
Next, you are able to calculate the distance of the matched objects from the center of your camera in both images. You will need to do this both for the x and y directions. We will call this your x and y disparity.
Now, you need to know the distance between the centers of the cameras that took the picture. Once you have this, there is some simple trig that you can do to solve for distance. There is a rather simple explanation of this here
Again, this is all conceptual, but it is important to know how the algorithms you are applying work. The first step to understanding the solution to a problem is to understand the problem itself. Once you have a full understanding of the problem, and the procedure for solving it, implementing that procedure with any library should become much easier. Good luck!
I am currently working on a project which we have a set of photos of trucks going by a camera. I need to detect what type of truck it is (how many wheels it has). So I am using EMGU to try to detect this.
Problem I have is I cannot seem to be able to detect the wheels using EMGU's HoughCircle detection, it doesn't detect all the wheels and will also detect random circles in the foliage.
So I don't know what I should try next, I tried implementing SURF algo to match wheels between them but this does not seem to work either since they aren't exactly the same, is there a way I could implement a "loose" SURF algo?
This is what I start with.
This is what I get after the Hough Circle detection. Many erroneous detections, has some are not even close to having a circle and the back wheels are detected as a single one for some reason.
Would it be possible to either confirm that the detected circle are actually wheels using SURF and matching them between themselves? I am a bit lost on what I should do next, any help would be greatly appreciated.
(sorry for the bad English)
UPDATE
Here is what i did.
I used blob tracking to be able to find the blob in my set of photos. With this I effectively can locate the moving truck. Then i split the rectangle of the blob in two and take the lower half from there i know i get the zone that should contain the wheels which greatly increases the detection. I will then run a light intensity loose check on the wheels i get. Since they are in general more black i should get a decently low value for those and can discard anything that is too white, 180/255 and up. I also know that my circles radius cannot be greater than half the detection zone divided by half.
In this answer I describe an approach that was tested successfully with the following images:
The image processing pipeline begins by either downsampling the input image, or performing a color reduction operation to decrease the amount data (colors) in the image. This creates smaller groups of pixels to work with. I chose to downsample:
The 2nd stage of the pipeline performs a gaussian blur in order to smooth/blur the images:
Next, the images are ready to be thresholded, i.e binarized:
The 4th stage requires executing Hough Circles on the binarized image to locate the wheels:
The final stage of the pipeline would be to draw the circles that were found over the original image:
This approach is not a robust solution. It's meant only to inspire you to continue your search for answers.
I don't do C#, sorry. Good luck!
First, the wheels projections are ellipses and not circles. Second, some background gradient can easily produce circle-like object so there should be no surprise here. The problem with ellipses of course is that they have 5 DOF and not 3DOF as circles. Note thatfive dimensional Hough space becomes impractical. Some generalized Hough transforms can probably solve ellipse problem at the expense of a lot of additional false alarm (FA) circles. To counter FA you have to verify that they really are wheels that belong to a truck and nothing else.
You probably need to start with specifying your problem in terms of objects and backgrounds rather than wheel detection. This is important since objects would create a visual context to detect wheels and background analysis will show how easy would it be to segment a truck (object) on the first place. If camera is static one can use motion to detect background. If background is relatively uniform a gaussian mixture models of its colors may help to eliminate much of it.
I strongly suggest using:
http://cvlabwww.epfl.ch/~lepetit/papers/hinterstoisser_pami11.pdf
and the C# implementation:
https://github.com/dajuric/accord-net-extensions
(take a look at samples)
This algorithm can achieve real-time performance by using more than 2000 templates (20-30 fps) - so you can cover ellipse (projection) and circle shape cases.
You can modify hand tracking sample (FastTemplateMatchingDemo)
by putting your own binary templates (make them in Paint :-))
P.S:
To suppress false-positives some kind of tracking is also incorporated. The link to the library that I have posted also contains some tracking algortihms like: Discrete Kalman Filter and Particle Filter all with samples!
This library is still under development so there is possibility that something will not work.
Please do not hesitate sending me a message.
What I want to do is take a source image which will contain a black-and-white chequered board of a known physical size and a known number of squares, and identify the boundaries of said board, as well as the angle from which it is being observed (assuming its perfectly flat) and from what distance.
If I can reliably identify the 4 corners of the board then I know how to calculate the angle and distance, so the task is more about identifying the chess board.
What I've tried so far is greyscaling the image and increasing the contrast so I end up with a stark black-and-white image (which to the eye contains blackness with just the white squares) - and while I can identify the boundaries of the board fine from a top-down perspective by measuring the frequency of changes from black->white->black, I'm not sure how to go about doing this for any angle.
Nominally I'm doing this with C#, but as far as actual answers go I'm happy for any code examples with a c-like syntax - more interested in the math and methodology for this one though.
Finding a general 2d object inside a 3d world is often done with SIFT or SURF.
There are 2 steps:
find a manageable number of local features in the image (such as strong corners)
find a correlation between those points in your image and your search pattern
OpenCV has an implementation for that:
Features2D + Homography to find a known object
The Wikipedia article on surf also points out another c# implementation.
Also see this Stackoverflow answer:
Now this is a pretty general method, and I do not know how well it will work with your checkerboard.
But there are specific approaches for chessboard patterns: such as the openCV function cvFindChessboardCorners (tutorial)
I never used it, but I found this description of the algorithm: (Source is in the file cvcalibinit.cpp )
Image binarization by thresholding to segment black and white squares
Find corners of the black squares:
Find contours of the boundaries of the black regions
Select contours of suitable shape
Approximate these contours with 4-vertex polygons
Among these select the quadrangles resembling calibration pattern squares
Extract corners of the selected quads, having at least one corner in vicinity
Group the corners of the selected quadrangles in lines according to calibration object size
I want to write in C# using some graphic library app that will show difference between two pictures as a map of vectors, which will show movement of points.
Something like this is implemented for matlab and is called PIVlab and as a example:
difference between
and
is represented by first image in
I would be very happy to write something similiar in .NET using for instance AForge. Can you help?
What you want is to find the Optical flow. Look here for a C# + emguCV implementation of an optical flow tracker.
An interesting question, are the dots always in view ?
Or is the image not that stable like camera noice.
Is the motion slow or fast > i mean is a pixel within reach between frames ?
Do the pixels move like in a single large group, for example tracking stars
Or do the pixels move more like groups of clouds or birds in the air ?
Or do the pixels move like flees each in various directions ?
Might there be some general suspected movement (like camera driving?)
Could you update your question with such info in your original question.
Because that differs a lot, in most cases its about tracking a near neighbor one might write an outspiraling pixel check for each pixel (or just for easy programming from small rectangle to large rectangle
That might not be fast dough its best for the flees example
I'm making a game in C# and XNA, and I was trying to come up with a method to render massive terrains without using a tremendous amount of memory or passing the poly limit hard-coded into XNA.
My solution so far is to create a massive heightmap, and that heightmap is loaded into memory at the beginning of the game in the initialization phase. Then, terrain is only generated nearest to the camera. This is accomplished by projecting a triangle whose vertex is the character and the other two endpoints extend to the sides of the character's viewing area. Then, all the pixels inside that triangle on the heightmap are rendered and drawn into the game, thus only rendering what is seen.
The problem is, I've successfully found (I think, can't test until I get terrain rendering) the three vertices of the triangle. Now I need to find a list of the coordinates for every single pixel inside that triangle - whole numbers only, because I just need a list of pixels to render.
I know it sounds a little confusing, so here's the gist of it:
I have an image, and I project a triangle onto that image. The only thing I know about that triangle are the three vertices. I need a list of the pixels inside that triangle.
I've been Googling around for maybe 20 minutes now, and I figured I midas well go ahead and post something here due to the fact that what I'm trying to do isn't all that common. If I find an answer, I'll be sure to post it here.
But until then, can anyone tell me how to accomplish this?
Edit: A formula, please. If you can provide a formula or algorithm, and an explanation, that would be just perfect.
Edit: I've posted a new question, as I've ditched this method of rendering large terrains. The question is here.
Start here:
http://mathworld.wolfram.com/TriangleInterior.html
One of the non-trivial problems, not mentioned there, that you have to deal with is the pixelization along the boundary.