I want to count number of people crossing a line from either side. I have a camera that is placed on ceiling and shooting for the floor where the line is (So camera sees just top of people heads; and so it is more of object detection than people detection).
Is there any sample solution for this problem or similar problems like this? So I can learn from them?
Edit 1: More than one person is crossing the line at any moment.
If nothing else but humans are subject to cross the line then you need not to detect people you only have to detect motion.
There are several approaches for motoin detection.
Probably the simplest one fits your goals. You simply calculate difference between successive frames of video stream and this way determine "motion mask" and thus detect line crossing event
As an improvement of this "algorithm" you may consider "running average" method.
To determine a direction of motion you can use "motion templates".
In order to increase accuracy of your detector you may try any background subtraction technique (which in turn is not a simple solution). For example, if there is some moving background which should be filtered out (e.g. using statistical learning)
All algorithms mentioned are included in OpenCV library.
UPD:
how to compute motion mask
Useful functions for determining motion direction cvCalcMotionGradient, cvSegmentMotion, cvUpdateMotionHistory (search docs). OpenCV library contains example code for motion analysis, see motempl.c
advanced background subtraction from "Learning OpenCV" book
I'm not an expert in video-based cv, but if you can reduce the problem into a finite set of images (for instance, entering frame, standing on line, exiting frame), then you can use one of many shape recognition algorithms. I know of Shape Context which is good, but I doubt if it subtle enough for this application (it won't tell the difference between a head and most other round objects).
Basically, try to extract key images from the video, and then test them with shape recognition algorithms.
P.S. Finding the key images might be possible with good motion detection methods.
Related
I'm new to Unity and I'm making a car racing Game. Now, I'm stuck at some point. I was looking for some solution of my problem, but couldn't succeed.
My problem is:
When I run my game on my phone, it sticks badly because whenever there are several buildings in front of the car camera, like one building behind another building, it lags. Reason for this is there are so many vertices and edges at that time, So the Car Camera is unable to capture all that stuff at same time.
How do I preload the 2nd Scene while loading 1st Scene?
I am using Unity free version.
In graphics programming, there is a common routine to simply don't draw objects that aren't in the field of view. I'm sure Unity can handle this. Check link: Unity description on this topic
I'm not hugely knowledgeable about Unity, but as a 3D modeller there's a bunch of things you can do to improve performance:
Create a simplified version of your buildings with fewer polygons for use when buildings are a long way away. A skyscraper, for example, can be as simple as a textured box.
If you've done that already, reduce the distance at which the simpler imposters are substituted for the complex versions.
Reduce the number of polygons by other means. A good example is if you've got a window ledge sticking out of the side of a building, don't try and make it an extension of the body. Instead, make it a separate box, delete the facet that won't be seen, and move it to intersect with the rest of the building.
Another good trick is to use bump maps or normal maps to approximate smaller features, rather than trying to model everything.
Opaqueness. Try not to have transparent windows in your buildings. It's computationally cheaper to make them just reflect the skybox or a suitably blurred reflection imposter. Also make sure that the material's shader is in Opaque mode, if it supports this.
You might also benefit a little from checking the 'Static' box on the game object, assuming that buildings aren't able to be moved (i.e. by smashing through them in a bulldozer).
Collision detection can also be a big drain. Be sure to use the simplest possible detection mesh you can - either a box, cylinder, sphere or a combination.
I need to be able to generate a 3D perspective from a bunch of 2D images of a pipe.
Basically... We have written software that interprets combined data from laser and sonar units to give us an image slice from a section of pipe. These units travel through the pipe and scan the inside of the pipe every 100mm.
All of this is working great. My client now wants to take all these 2D image slices and generate a 3D view so they can "travel" through the pipe looking at defects etc.. that are picked up by the scans. We can see the defects in the 2D images but there can be hundreds of images in a single inspection - hence the requirement to be able to look through the pipe.
I am doing this in VS2010 on the .NET 4 platform in C#.
I am honestly clueless as to where to start here. I am not a graphics developer so this is all new territory to me. I see it as a great challenge but need some help kicking off - and a bit of direction.
Any help appreciated :)
Mike
Well, every 10cm isn't very detailed.. However, you need to scan the pixels of the pipe, creating a list of closed polygons, then just use a trianglestrip to connect one set to the next, all the way down the pipe.
Try to start with very basic 2d instead of full blown 3d rendering - may be good enough. Pipe when you look at it from inside can be represented as several trapeze. Assuming your images are small cylinder portions of a pipe - map each stripe to trapezoids (4 would be good start - easy to position) and draw than in circular pattern. You may draw several stripes this way at the same time. To move back/forward - just reassign images to trapezoids.
If you need full 3d - consider if WPF would work, if not - XNA or some OpenGL library will give you full 3d.
You don't specify the context, 100mm sample intervals may be sparse (a 1m pipe) or detailed (10km pipe). Nor do you specify how many sample points there are (number of cross sections and size of cross section image).
A simple way to show the data is to use voxels where each pixel on a cross section is treated as a cube and adjacent samples form adjacent cubes (think Minecraft). The result will look blocky but as it's an engineering / scientific application this is probably preferable. Interpolating the model to produce a smooth surface may hide defects or make areas appear to be defective. Also, rendering a cross section through a voxel is a bit easier than a polygon surface.
i have 4 shapes in image
i want to get pixels of one shape in list of point
the shapes have same color
List<point> GetAllPixelInShape(point x)
{
//imp
}
where x point of this shape
Long story short, you could begin with a connected components / region labeling algorithm.
http://en.wikipedia.org/wiki/Connected-component_labeling
In OpenCV you can call findContours() to identify contours, which are the borders of your connected regions.
http://dasl.mem.drexel.edu/~noahKuntz/openCVTut7.html
OCR is an extremely difficult task, especially for a script like Arabic. Creating an OCR algorithm from scratch takes a lot of work and numerous algorithms working together. OCR for machine printed text is hard enough. Implementing an algorithm to read handwriting is not something I'd suggest trying until you have a year or two of image processing experience. If you haven't read textbooks and academic papers on OCR, you're likely to spend a lot of time reproducing work that has already been done.
If you're not familiar with contour tracing and/or blob analysis, then working with OpenCV may not be a good first step. Since you have a specific goal in mind, you might first try different algorithms in a user-friendly GUI that will save you coding time.
Consider downloading ImageJ so that you can see how the algorithms work. There are plugins for a variety of common image processing algorithms.
http://rsbweb.nih.gov/ij/
Your proposed method signature doesn't provide enough information to solve this. Your method will need to know the bounds of your shape, how long and wide it is etc, ideally a set of points that indicate those bounds.
Once you have those, you could potentially apply the details of this article, in particular the algorithms specified in the answer to solve your problem.
I am currently using EmguCV (OpenCV C# wrapper) sucessfully to detect faces in real-time (webcam). I get around 7 FPS.
Now I'm looking to improve the performances (and save CPU cycles), and I'm looking for options, here are my ideas:
Detect the face, pick up features of the face and try to find those features in the next frames (using SURF algorithm), so this becomes a "face detection + tracking". If not found, use face detection again.
Detect the face, in the next frame, try to detect the face in a ROI where the face previously was (i.e. look for the face in a smaller part of the image). If the face is not found, try looking for it in the whole image again.
Side idea: if no face detected for 2-3 frames, and no movement in the image, don't try to detect anymore faces until movement is detected.
Do you have any suggestions for me ?
Thanks.
All the solutions you introduced seem to be smart and reasonable. However, if you use Haar for face detection you might try to create a cascade with less stages. Although 20 stages are recommended for face detection, 10-15 might be enough. That would noticeably improve performance. Information on creating own cascades can be found at Tutorial: OpenCV haartraining (Rapid Object Detection With A Cascade of Boosted Classifiers Based on Haar-like Features).
Again, using SURF is a good idea. You can also try P-N learning: Bootstrapping binary classifiers by structural constraints. There are interesting videos on YouTube presenting this method, try to find them.
For the SURF algorithm, you could try, but i am not sure that it provides relevant features on a face, maybe around the eyes, or if you are close and have skin irregularities, or again maybe in the hair if the resolution is enough. Moreover, SURF is not really really fast, and i would just avoiding doing more calculous if you want to save CPU time.
The roi is a good idea, you would choose it by doing a camshift algorithm, it won't save a lot of CPU, but you could try as camshift is a very lightweight algorithm. Again i am not sure it will be really relevant, but you got the good idea in your second post : minimize the zone where to search...
The side idea seems quite good to me, you could try to detect motion (global motion for instance), if there's not so much, then don't try to detect again what you already detected ... You could try doing that with motion templates as you know the silouhette from meanshift or face detection...
A very simple, lightweight but un-robust template matching with the frame n-1 and frame n could give you aswell a coefficient that measures a sort of similarity between these two frames, you can say that below a certain threshold you activate face detection.... why not ? It should take 5min to implement if the C# wrapper has the matchTemplate() equivalent function...
I'll come back here if i have better (deeper) ideas, but for now, i've just come back from work and it's hard to think more...
Julien,
This is not a perfect answer, but just a suggestion.
In my digital image processing classes in my last semester of B.Tech in CS, i learned about bit place slicing, and how the image with just its MSB plane information gives almost 70% of the useful image information. So, you'll be working with almost the original image but with just one-eighth the size of the original.
So although i haven't implemented it in my own project, i was wondering about it, to speed up face detection. Because later on, eye detection, pupil and eye corner detection also take up a lot of computation time and make the whole program slow.
Continuing from this thread:
What are good algorithms for vehicle license plate detection?
I've developed my image manipulation techniques to emphasise the license plate as much as possible, and overall I'm happy with it, here are two samples.
Now comes the most difficult part, actually detecting the license plate. I know there are a few edge detection methods, but my maths is quite poor so I'm unable to translate some of the complex formulas into code.
My idea so far is to loop through every pixel within the image (for loop based on img width & height) From this compare each pixel against a list of colours, from this an algorithm is checked to see if the colors keep differentiating between the license plate white, and the black of the text. If this happens to be true these pixels are built into a new bitmap within memory, then an OCR scan is performed once this pattern has stopped being detected.
I'd appreciate some input on this as it might be a flawed idea, too slow or intensive.
Thanks
Your method of "see if the colors keep differentiating between the license plate white, and the black of the text" is basically searching for areas where the pixel intensity changes from black to white and vice-versa many times. Edge detection can accomplish essentially the same thing. However, implementing your own methods is still a good idea because you will learn a lot in the process. Heck, why not do both and compare the output of your method with that of some ready-made edge detection algorithm?
At some point you will want to have a binary image, say with black pixels corresponding to the "not-a-character" label, and white pixels corresponding to the "is-a-character" label. Perhaps the simplest way to do that is to use a thresholding function. But that will only work well if the characters have already been emphasized in some way.
As someone mentioned in your other thread, you can do that using the black hat operator, which results in something like this:
If you threshold the image above with, say, Otsu's method (which automatically determines a global threshold level), you get this:
There are several ways to clean that image. For instance, you can find the connected components and throw away those that are too small, too big, too wide or too tall to be a character:
Since the characters in your image are relatively large and fully connected this method works well.
Next, you could filter the remaining components based on the properties of the neighbors until you have the desired number of components (= number of characters). If you want to recognize the character, you could then calculate features for each character and input them to a classifier, which usually is built with supervised learning.
All the steps above are just one way to do it, of course.
By the way, I generated the images above using OpenCV + Python, which is a great combination for computer vision.
Colour, as much as looks good, will present quite some challenges with shading and light conditions. Depends really how much you want to make it robust but real world cases have to deal with such issues.
I have done research on road footage (see my profile page and look here for sample) and have found that the real-world road footage is extremely noisy in terms of light conditions and your colours can change from Brown to White for a yellow back-number-plate.
Most algorithms use line detection and try to find a box with an aspect ratio within an acceptable range.
I suggest you do a literature review on the subject but this was achieved back in 1993 (if I remember correctly) so there will be thousands of articles.
This is quite a scientific domain so just an algorithm will not solve it and you will needs numerous pre/post processing steps.
In brief, my suggestion is to use Hough transform to find lines and then try to look for rectangles that could create acceptable aspect ratio.
Harris feature detection could provide important edges but if the car is light-coloured this will not work.
If you have a lot of samples, you could try to check face detection method developed by Paul Viola and Michael Jones. It's good for face detection, maybe it'll do fine with license plate detection (especially if combined with some other method)