I am working on removing the background and leaving only the bodies with Kinect V2 and c#/WPF in real time.
Removing the background works fine, but the edges of the bodies are very rough with Jaggies on the edges.
I need to smooth the edges in real-time (30 frames per second). I would appreciate any advice on that.
I am able to select the edges (similar to Photoshop's magic wand).
I tried to use something like Gaussian blur, but it seems to be too slow for a real-time application. Probably I am missing something because it seems to be a standard problem for many applications like games etc. Thank you!
You probably need to look into implementations of depth image enhancements or smoothing that fill holes around the edges of the silhouettes. For starters; maybe you can look into Kinect Depth Smoothing.This should work in real time since its just based on calculating modes. For more accurate implementation, there are research papers that address the same issue such as the ones below:
[a] Chen, L., Lin, H.., Li, S., “Depth image enhancement for Kinect using region growing and bilateral filter,” Pattern Recognition (ICPR), 2012 21st International Conference on, 3070–3073 (2012).
[b] Le, A. V., Jung, S.-W.., Won, C. S., “Directional joint bilateral filter for depth images,” Sensors 14(7), 11362–11378, Multidisciplinary Digital Publishing Institute (2014).
Related
I have been experimenting with depth sensors using IR but they have been sadly lacking in accuracy (at least for what I want) and range.
So, looking for alternatives.
The application is I am holding a rectangular block of plastic and I am sweeping it from left to right across a surface ( a wall in this case ) and I want to know if the gap between plastic surface and wall surface reaches a certain threshold.
I have a web camera attached to a Microsoft Surface and I am aiming this camera at this user motion.
If I had better quality image I am sure I could use basic geometry to work this out. I looking around for better cameras as I type...
I was considering a radar sensor instead.
I spent many hours yesterday looking for something like a usb radar sensor with a SDK that is friendly for usage with C#.
I have not found anything.
That is not to say I have given up. I am continuing to look.
I just thought I would post here as well for any ones ideas on this?
i will continue to update my questions with hopefully a solution if no one else does.
thanks
I need to be able to generate a 3D perspective from a bunch of 2D images of a pipe.
Basically... We have written software that interprets combined data from laser and sonar units to give us an image slice from a section of pipe. These units travel through the pipe and scan the inside of the pipe every 100mm.
All of this is working great. My client now wants to take all these 2D image slices and generate a 3D view so they can "travel" through the pipe looking at defects etc.. that are picked up by the scans. We can see the defects in the 2D images but there can be hundreds of images in a single inspection - hence the requirement to be able to look through the pipe.
I am doing this in VS2010 on the .NET 4 platform in C#.
I am honestly clueless as to where to start here. I am not a graphics developer so this is all new territory to me. I see it as a great challenge but need some help kicking off - and a bit of direction.
Any help appreciated :)
Mike
Well, every 10cm isn't very detailed.. However, you need to scan the pixels of the pipe, creating a list of closed polygons, then just use a trianglestrip to connect one set to the next, all the way down the pipe.
Try to start with very basic 2d instead of full blown 3d rendering - may be good enough. Pipe when you look at it from inside can be represented as several trapeze. Assuming your images are small cylinder portions of a pipe - map each stripe to trapezoids (4 would be good start - easy to position) and draw than in circular pattern. You may draw several stripes this way at the same time. To move back/forward - just reassign images to trapezoids.
If you need full 3d - consider if WPF would work, if not - XNA or some OpenGL library will give you full 3d.
You don't specify the context, 100mm sample intervals may be sparse (a 1m pipe) or detailed (10km pipe). Nor do you specify how many sample points there are (number of cross sections and size of cross section image).
A simple way to show the data is to use voxels where each pixel on a cross section is treated as a cube and adjacent samples form adjacent cubes (think Minecraft). The result will look blocky but as it's an engineering / scientific application this is probably preferable. Interpolating the model to produce a smooth surface may hide defects or make areas appear to be defective. Also, rendering a cross section through a voxel is a bit easier than a polygon surface.
I am currently using EmguCV (OpenCV C# wrapper) sucessfully to detect faces in real-time (webcam). I get around 7 FPS.
Now I'm looking to improve the performances (and save CPU cycles), and I'm looking for options, here are my ideas:
Detect the face, pick up features of the face and try to find those features in the next frames (using SURF algorithm), so this becomes a "face detection + tracking". If not found, use face detection again.
Detect the face, in the next frame, try to detect the face in a ROI where the face previously was (i.e. look for the face in a smaller part of the image). If the face is not found, try looking for it in the whole image again.
Side idea: if no face detected for 2-3 frames, and no movement in the image, don't try to detect anymore faces until movement is detected.
Do you have any suggestions for me ?
Thanks.
All the solutions you introduced seem to be smart and reasonable. However, if you use Haar for face detection you might try to create a cascade with less stages. Although 20 stages are recommended for face detection, 10-15 might be enough. That would noticeably improve performance. Information on creating own cascades can be found at Tutorial: OpenCV haartraining (Rapid Object Detection With A Cascade of Boosted Classifiers Based on Haar-like Features).
Again, using SURF is a good idea. You can also try P-N learning: Bootstrapping binary classifiers by structural constraints. There are interesting videos on YouTube presenting this method, try to find them.
For the SURF algorithm, you could try, but i am not sure that it provides relevant features on a face, maybe around the eyes, or if you are close and have skin irregularities, or again maybe in the hair if the resolution is enough. Moreover, SURF is not really really fast, and i would just avoiding doing more calculous if you want to save CPU time.
The roi is a good idea, you would choose it by doing a camshift algorithm, it won't save a lot of CPU, but you could try as camshift is a very lightweight algorithm. Again i am not sure it will be really relevant, but you got the good idea in your second post : minimize the zone where to search...
The side idea seems quite good to me, you could try to detect motion (global motion for instance), if there's not so much, then don't try to detect again what you already detected ... You could try doing that with motion templates as you know the silouhette from meanshift or face detection...
A very simple, lightweight but un-robust template matching with the frame n-1 and frame n could give you aswell a coefficient that measures a sort of similarity between these two frames, you can say that below a certain threshold you activate face detection.... why not ? It should take 5min to implement if the C# wrapper has the matchTemplate() equivalent function...
I'll come back here if i have better (deeper) ideas, but for now, i've just come back from work and it's hard to think more...
Julien,
This is not a perfect answer, but just a suggestion.
In my digital image processing classes in my last semester of B.Tech in CS, i learned about bit place slicing, and how the image with just its MSB plane information gives almost 70% of the useful image information. So, you'll be working with almost the original image but with just one-eighth the size of the original.
So although i haven't implemented it in my own project, i was wondering about it, to speed up face detection. Because later on, eye detection, pupil and eye corner detection also take up a lot of computation time and make the whole program slow.
Continuing from this thread:
What are good algorithms for vehicle license plate detection?
I've developed my image manipulation techniques to emphasise the license plate as much as possible, and overall I'm happy with it, here are two samples.
Now comes the most difficult part, actually detecting the license plate. I know there are a few edge detection methods, but my maths is quite poor so I'm unable to translate some of the complex formulas into code.
My idea so far is to loop through every pixel within the image (for loop based on img width & height) From this compare each pixel against a list of colours, from this an algorithm is checked to see if the colors keep differentiating between the license plate white, and the black of the text. If this happens to be true these pixels are built into a new bitmap within memory, then an OCR scan is performed once this pattern has stopped being detected.
I'd appreciate some input on this as it might be a flawed idea, too slow or intensive.
Thanks
Your method of "see if the colors keep differentiating between the license plate white, and the black of the text" is basically searching for areas where the pixel intensity changes from black to white and vice-versa many times. Edge detection can accomplish essentially the same thing. However, implementing your own methods is still a good idea because you will learn a lot in the process. Heck, why not do both and compare the output of your method with that of some ready-made edge detection algorithm?
At some point you will want to have a binary image, say with black pixels corresponding to the "not-a-character" label, and white pixels corresponding to the "is-a-character" label. Perhaps the simplest way to do that is to use a thresholding function. But that will only work well if the characters have already been emphasized in some way.
As someone mentioned in your other thread, you can do that using the black hat operator, which results in something like this:
If you threshold the image above with, say, Otsu's method (which automatically determines a global threshold level), you get this:
There are several ways to clean that image. For instance, you can find the connected components and throw away those that are too small, too big, too wide or too tall to be a character:
Since the characters in your image are relatively large and fully connected this method works well.
Next, you could filter the remaining components based on the properties of the neighbors until you have the desired number of components (= number of characters). If you want to recognize the character, you could then calculate features for each character and input them to a classifier, which usually is built with supervised learning.
All the steps above are just one way to do it, of course.
By the way, I generated the images above using OpenCV + Python, which is a great combination for computer vision.
Colour, as much as looks good, will present quite some challenges with shading and light conditions. Depends really how much you want to make it robust but real world cases have to deal with such issues.
I have done research on road footage (see my profile page and look here for sample) and have found that the real-world road footage is extremely noisy in terms of light conditions and your colours can change from Brown to White for a yellow back-number-plate.
Most algorithms use line detection and try to find a box with an aspect ratio within an acceptable range.
I suggest you do a literature review on the subject but this was achieved back in 1993 (if I remember correctly) so there will be thousands of articles.
This is quite a scientific domain so just an algorithm will not solve it and you will needs numerous pre/post processing steps.
In brief, my suggestion is to use Hough transform to find lines and then try to look for rectangles that could create acceptable aspect ratio.
Harris feature detection could provide important edges but if the car is light-coloured this will not work.
If you have a lot of samples, you could try to check face detection method developed by Paul Viola and Michael Jones. It's good for face detection, maybe it'll do fine with license plate detection (especially if combined with some other method)
I recently joined a project where I need to get some vehicle based computer vision system. So what sort of special functionalities does a camera need, to be able to capture images while traveling at varying speeds ? for example how high a frame rate is required, and the exposure duration, shutter speed? Do you think that webcams(even if high end) will be able to achieve it ? The project requires the camera to be programmable in C# ...
Thank you very much in advance!
Unless video is capable of producing high quality low blur images, I would go with a camera with really fast shutterspeed, very short exposure duration, and for frame rate, following Seth's math, 44 centimeters is roughly a little more than a foot, which should be decent for calculations.
Reaction time for a human to respond to someone hitting the breaks in front of them is 1.5 seconds. If you can determine they hit their break light within 1/30th of a second, and it takes you 1 second to calculate and apply breaks, you already beat a human in reaction time.
How fast your shutter speed needs to be, is based on how fast you're vehicle is moving. Shutter speed reduces motion blur for a more accurate picture to analyze.
Try different speeds (if you can get a camera with this value configurable, might help).
I'm not sure that's an answerable question. It sounds like the sort of thing that the Darpa Grand Challenge hopes to determine :)
With regard to frame rate: If you're vehicle is going 30 miles per hour, a 30 FPS web cam will capture one frame for every 44 centimeters the vehicle travels. Whether or not that's "enough" depends on what you're planning to do with the image.
Not sure about the out-of-the-box C# programability, but a specific web-cam style camera to consider would be the PS3 eye.
It was specially engineered for motion-capture and (as I understand it) is capable of higher-quality images a high framerates than the majority of the competition. Windows drivers are available for it, and that opens the door for creating a C# wrapper.
Here is the product page, note the 120fps upper-end spec (not sure that the Windows drivers run at this rate, but obviously the hardware is capable of it).
One Note on shutter speed... images taken at a high framerate in low-light will likely be underexposed and unusable. If you'll need this to work in varying light conditions then the framerate will likely either need to be fixed at the low-end of your acceptable range, or will need to self-adjust based on available light.
These guys: Mobileye - develop such commercial systems for lane departure warnings and vehicle and pedestrian detection.
If you go to the "Manufacturer Products->Development and Evaluation Platforms->Cameras"
You can see what they use as cameras and also for their processing platforms.
30 fps should be sufficient for the applications mentioned above.
If money isn't an issue, take a look at cameras from companies like Opeton and others. You can control every aspect of every image capture including: capture time, image size, ++.
My iPhone can take pictures out the side of a car that are fairly blur free... past 10-20 feet. Inside of that, things are simply moving too fast; the shutter speed would need to be higher to not blur that.
Start with a middle-of-the-road webcamera, and move up as necessary? A laptop and a ride in your car while capturing still images would probably give you an idea of how well it works.