In the newest Linux kernel, it supports Kinect through a driver. I want access to the RGB and D (depth) streams and put them into a 2D array, either 64bit ints, or two separate arrays will work. C# is preferred, C++ is acceptable.
So my question is: where can I find more information about this, e.g., articles and documentation? What would a simple example program look like, e.g., printing the color and depth at position 100x100?
I'll up-vote any good links, and accept the first working code sample.
Thanks,
Frankie
P.s., I'm aware of the OpenKinect, NITE, Microsoft SDK, etc., projects. I want this to be easy for me to install on other computers and Linux distros which is why the common kernel driver is preferred. My main use will be a web-cam that replaces pixels further away than depth X and saves to disk.
Update
Since asking I haven't gotten much further. I found this article. I checked out the Git repo, which doesn't seem to have been updated since April and I don't see any connection to the Linux Kernel or it ever being incorporated. There's no mention of Kinect in any later blog posts there, other than this unrelated one.
Update 2
I can't seem to find who applied the Kinect driver to the kernel. There is a mirror of the kernel on GitHub. I tried using Google to search it, but this query and variations didn't turn up anything. Then I tried searching GitHub with no positive hits. Does anyone have any information?
Unfortunately, the driver doesn't support the depth stream, only an unprocessed image from the monochrome sensor. So it's not possible using only the kernel driver. See also a blog post I wrote on this subject. If you remove the built-in kernel modules, you can do it with libfreenect though.
You can find the driver file here on GitHub: kinect.c.
The driver doesn't support a D stream according to the link you posted:
[media] gspca - kinect: New subdriver for Microsoft Kinect
The Kinect sensor is a device used by Microsoft for its Kinect
project, which is a system for controller-less Human-Computer
interaction targeted for Xbox 360.
In the Kinect device, RGBD data is captured from two distinct sensors:
a regular RGB sensor and a monochrome sensor which, with the aid of a
IR structured light, captures what is finally exposed as a depth map;
so what we have is basically a Structured-light 3D scanner.
The Kinect gspca subdriver just supports the video stream for now,
exposing the output from the RGB sensor or the unprocessed output from
the monochrome sensor; it does not deal with the processed depth
stream yet, but it allows using the sensor as a Webcam or as an IR
camera (an external source of IR light might be needed for this use).
The low level implementation is based on code from the OpenKinect
project (http://openkinect.org).
From source of the driver it appeares the author is Antonio Ospite reachable at ospite#studenti.unina.it
As already suggested by the comments the author should be able to answer all questions you asked since what you want is really dependent on what is exactly exposed by the driver (which might even be version dependent).
Related
We have a c# application that performs processing on video streams. This is a low-level application that receives each frame in Bitmap format, so basically we need 25 images each second. This application is already working for some of our media sources, but we now need to add a webcam as an input device.
So we basically need to capture bitmap images from a webcam continuously so that we can pass all these frames as a "stream" to our application.
What is the best and simplest way to access the webcam and read the actual frames directly from the webcam as individual images? I am still in the starting blocks.
There are a multitude of libraries out there that allows one to access the webcam, preview the content of the webcam on a windows panel and then use screen capturing to capture this image again. This, unfortunately, will not give us the necessary performance when capturing 25 frames per second. IVMRWindowlessControl9::GetCurrentImage has been mentioned as another alternative, but this again seems to be aimed at an infrequent snapshot rather than a constant stream of images. Directshow.Net is mentioned by many as a good candidate, but it is unclear how to simply grab the images from the webcam. Also, many sources state a concern about Microsoft no longer supporting Directshow. Also, implementations I've seen of this requires ImageGrabber which is apparently also no longer supported. The newer alternative from MS seems to be Media Foundation, but my research hasn't turned up any working examples of how this can be implemented (and I'm not sure if this will run on older versions of windows such as XP). DirectX.Capture is an awesome library (see a nice implementation) but seems to lack the filters and methods to get the video images directly. I have also started looking at Filters and Filter Graphs but this seems awfully complex and does feel a bit like "reinventing the wheel".
Overall, all the solutions briefly mentioned above seem to rather old. Can someone please point me in the direction of a step-by-step guide for getting a webcam working in C# and grabbing several images per second from it? (We will also have to do audio at some point, so a solution that does not exclude video would be most helpful).
I use AForge.Video (find it here: code.google.com/p/aforge/) because it's a very fast c# implementation. i am very pleased with the performance and it effortlessly captures from two HD webcams at 30fps on an 8 year old PC. the data is supplied as a native IntPtr so it's ideal for further processing using native code or opencv.
opencv wrappers emgu and opencvsharp both implement a rudimentary video capture functionality which might be sufficient for your purposes. clearly if you are going perform image processing / computer vision you might want to use those anyway.
As dr.mo suggests, Aforge was the answer.
I used the tutorial from here: http://en.code-bude.net/2013/01/02/how-to-easily-record-from-a-webcam-in-c/
In the tutorial, they have an event handler fire each time a frame is received from the webcam. In the original tutorial, this bitmap is used to write the image to a PictureBox. I have simply modified it to save the bitmap image to a file rather than to a picturebox. So I have replaced the following code:
pictureBoxVideo.BackgroundImage = (Bitmap)eventArgs.Frame.Clone();
with the following code:
Bitmap myImage = (Bitmap)eventArgs.Frame.Clone();
string strGrabFileName = String.Format("C:\\My_folder\\Snapshot_{0:yyyyMMdd_hhmmss.fff}.bmp", DateTime.Now);
myImage.Save(strGrabFileName, System.Drawing.Imaging.ImageFormat.Bmp);
and it works like a charm!
I'm working on small WPF desktop app to track a robot. I have a Kinect for Windows on my desk and I was able to do the basic features and run the Depth camera stream and the RGB camera stream.
What I need is to track a robot on the floor but I have no idea where to start. I found out that I should use EMGU (OpenCV wrapper)
What I want to do is track a robot and find it's location using the depth camera. Basically, it's for localization of the robot using Stereo Triangulation. Then using TCP and Wifi to send the robot some commands to move him from one place to an other using both the RGB and Depth camera. The RGB camera will also be used to map the object in the area so that the robot can take the best path and avoid the objects.
The problem is that I have never worked with Computer Vision before and it's actually my first, I'm not stuck to a deadline and I'm more than willing to learn all the related stuff to finish this project.
I'm looking for details, explanation, hints, links or tutorials to achieve my need.
Thanks.
Robot localization is a very tricky problem and I myself have been struggling for months now, I can tell you what I have achieved But you have a number of options:
Optical Flow Based Odometery: (Also known as visual odometry):
Extract keypoints from one image or features (I used Shi-Tomashi, or cvGoodFeaturesToTrack)
Do the same for a consecutive image
Match these features (I used Lucas-Kanade)
Extract depth information from Kinect
Calculate transformation between two 3D point clouds.
What the above algorithm is doing is it is trying to estimate the camera motion between two frames, which will tell you the position of the robot.
Monte Carlo Localization: This is rather simpler, but you should also use wheel odometery with it.
Check this paper out for a c# based approach.
The method above uses probabalistic models to determine the robot's location.
The sad part is even though libraries exist in C++ to do what you need very easily, wrapping them for C# is a herculean task. If you however can code a wrapper, then 90% of your work is done, the key libraries to use are PCL and MRPT.
The last option (Which by far is the easiest, but the most inaccurate) is to use KinectFusion built in to the Kinect SDK 1.7. But my experiences with it for robot localization have been very bad.
You must read Slam for Dummies, it will make things about Monte Carlo Localization very clear.
The hard reality is, that this is very tricky and you will most probably end up doing it yourself. I hope you dive into this vast topic, and would learn awesome stuff.
For further information, or wrappers that I have written. Just comment below... :-)
Best
Not sure if is would help you or not...but I put together a Python module that might help.
http://letsmakerobots.com/node/38883#comments
I need to make a program that would allow me to capture camera stream with my 2 other programs simultaneously. Basically I need the functionality that ManyCams (http://www.manycam.com/) offers.
How can I do this? I'm interested in a free c++ library, or some C#/C++ .net solution.
well, one easy step will be "share" rather than "duplicate" camera streams, your application can capture camera streams then provide api to share it between multiple applications. OpenCV worth a look.
I'm not sure but you probably need to have good understanding of how hardware work and know how to develop a driver for it. What you should do is to get the graphic from usb device's driver and use what you get from your camera as input source for your own virtual camera drivers.
I'm trying to use a 2D camera to recognize the device/object a user is pointing at so I was looking for a skeleton tracking software using a 2D camera in order to be able to do that. Is there any open source project that deals with skeleton tracking using 2D cameras?
(I've gone through tons of links on Google and it seems like most of what's there is just research papers but no actual open source projects)
Thanks!
Skamleton could be an option. It's an open-source project in early stages, but it implements a background subtractor, a skin color classifier, blob tracking and face classification. There is a demo on YouTube.
Note Skamleton use simple cameras, not RGB-D (depth) cameras as the Kinectic system (Kinect uses a structured light device from PrimSense).
It seems there's kind of a pre-release of a SDK for Kinect from Microsoft. Perhaps this might be helpful for you:
http://nuigroup.com/forums/viewthread/11249/
(Although I think this won't be Open Source. But since you are using c#, a Microsoft SDK might be ok for you.)
This seems like an old post, but in case anyone is still looking Extreme Reality uses a regular webcam and does skeleton tracking. It's not open source, but I've played around with it a bit and it does seem to be fairly robust.
http://www.xtr3d.com/developers/resources/
I'm writting a hobby project to deal with files on cameras.
Previously I found issues with the camera and the FolderBrowserDialog.
What I believe is happing is that the camera is using MTP or PTP (Picture Transfer Protocol not peer-to-peer).
In order to make interfacing with the camera more seamless I'd like to use PTP or MTP to access the camera. Are there any MTP / PTP Wrappers for .Net people can recommend? I'm keen to avoid writing my own or dabbling in unmanaged code if possible.
I have found this blog post by dimeby8 which has been a great starting point with a lot of useful information about how the protocol works, however it leaves a lot to be desired in the way of managed implementations
http://blogs.msdn.com/dimeby8/archive/tags/C_2300_/default.aspx
I have also found a crude C++/CLI MTP wrapper- it has next to no functionality but is a good demonstration of mixed managed/unmanged code:
http://ko.sourceforge.jp/projects/sfnet_mtpsharp/
And there's a CodePlex project but it doesn't demonstrate transfers or - what I'm interested in- editing camera metadata (specifically the camera date)
http://www.codeplex.com/portabledevicelib/
Have you had any success with this project?