I am working on a project in Unity-Android that requires Gesture Recognition properly and efficiently (as it will be running in a mobile) for further processes.
Currently, I am using HSV to detect the hand region which is not that accurate even when a touch-input is taken from the user.
Please refer to me a solution that is efficiently detecting hand region. It should be robust as the user will be opening/closing any number of fingers or rotating the hand.
Note:
As I'm working on Unity, I need help in C# only
Related
I am trying to develop an AR application in unity using the new AR Foundation.
This application would need to use two features:
It needs to use a large amount of tracking images
It needs to properly identify the tracked image (marker) (Only one image will be visible at the same moment)
What I need is dynamically generate the fiducial markers preferably with the tracking part same for all and only with specific part carrying id of the marker. Preferably the AR code would be similar to the ARToolkit One from this image:
Do these markers work well with ARfoundation (abstraction over ARCore and ARKit)?
Lets say I ll add 100 of these generated codes into the XRImageIs it possible that AR Foundation image targets get "confused" and mixup tracked images? Could in theory i use QR codes as Markers and simply code ID information into the QR code?
In a project I searched for a good way to implement a lot of different markers to identify a vast amount of different real world objects. At first I tried QRCodes and added them to the Image Database in ARFoundation.
It worked but sometimes markers got mixed up and this already happened by using only 4 QRCodes containing words ("left", "right", "up", "down"). The problem was that ARFoundation relies on ARCore, ARKit, etc. (Depending on the platform you want to build.)
Excerpt from the ARCore guide:
Avoid images with that contain a large number of geometric features, or very few features (e.g. barcodes, QR codes, logos and other line art) as this will result in poor detection and tracking performance.
The next thing I tried was to combine OpenCV with ARFoundation and use ArUco Marker for detection. The detection works much better and faster than the Image Recognition. This was done by accessing the Camera Image and using OpenCVs marker detection. In ARFoundation you can access the camera image by using public bool TryAcquireLatestCpuImage(out XRCpuImage cpuImage).
The Problem of this method:
This is a resource-intensive process that impacts performance...
On an iPad Pro 13" 2020, the performance in my application dropped from constant 60 FPS to around 25 FPS. For me, this was a too serious performance drop.
A solution could be to create a collection of images with large variations and perfect score, but I am unsure how images with all these aspects in mind could be generated. (Probably also limited to 1000 images per reference database, see ARCore guide)
If you want to check if these markers works well in ARCore , goto this link and download the arcoreimg tool.
The tool will give you a score that will let you know if this image is trackable or not. Though site recommends the score to be 75 , i have tested this for score of as low as 15. Here is quick demo if you are interested to see. The router image in the demo has a score of 15.
I'm working on small WPF desktop app to track a robot. I have a Kinect for Windows on my desk and I was able to do the basic features and run the Depth camera stream and the RGB camera stream.
What I need is to track a robot on the floor but I have no idea where to start. I found out that I should use EMGU (OpenCV wrapper)
What I want to do is track a robot and find it's location using the depth camera. Basically, it's for localization of the robot using Stereo Triangulation. Then using TCP and Wifi to send the robot some commands to move him from one place to an other using both the RGB and Depth camera. The RGB camera will also be used to map the object in the area so that the robot can take the best path and avoid the objects.
The problem is that I have never worked with Computer Vision before and it's actually my first, I'm not stuck to a deadline and I'm more than willing to learn all the related stuff to finish this project.
I'm looking for details, explanation, hints, links or tutorials to achieve my need.
Thanks.
Robot localization is a very tricky problem and I myself have been struggling for months now, I can tell you what I have achieved But you have a number of options:
Optical Flow Based Odometery: (Also known as visual odometry):
Extract keypoints from one image or features (I used Shi-Tomashi, or cvGoodFeaturesToTrack)
Do the same for a consecutive image
Match these features (I used Lucas-Kanade)
Extract depth information from Kinect
Calculate transformation between two 3D point clouds.
What the above algorithm is doing is it is trying to estimate the camera motion between two frames, which will tell you the position of the robot.
Monte Carlo Localization: This is rather simpler, but you should also use wheel odometery with it.
Check this paper out for a c# based approach.
The method above uses probabalistic models to determine the robot's location.
The sad part is even though libraries exist in C++ to do what you need very easily, wrapping them for C# is a herculean task. If you however can code a wrapper, then 90% of your work is done, the key libraries to use are PCL and MRPT.
The last option (Which by far is the easiest, but the most inaccurate) is to use KinectFusion built in to the Kinect SDK 1.7. But my experiences with it for robot localization have been very bad.
You must read Slam for Dummies, it will make things about Monte Carlo Localization very clear.
The hard reality is, that this is very tricky and you will most probably end up doing it yourself. I hope you dive into this vast topic, and would learn awesome stuff.
For further information, or wrappers that I have written. Just comment below... :-)
Best
Not sure if is would help you or not...but I put together a Python module that might help.
http://letsmakerobots.com/node/38883#comments
I have a question that How to detect the change on the screen? Its position is not necessary but is possible to get its position it will be helpful. I searched it on the internet but not found any suitable answer. Now, I am making a program in C# and I have to detect a change on the screen. I tried to capture four screen shots per second and compare them. This method works but it badly effect on the performance of the PC.
I think it is easy to do in C or Assembly language (x86) because in assembly we can get access to video memory directly.
Is it possible to do in C#?
Code sample will be appreciated.
Project: Detect any change on full Screen camera monitoring software.
Are you really looking just for simple difference of what you see on your monitor? I doubt that would do the job.
For motion detection from cam input you can take a look at Motion Detection Algorithms article on CodeProject.
Aside from taking screen captures and comparing them at some time intervals (which would cause performance issues),
The only solution i can think of is hooking up to system events, the "redraw" kind of events.
You will need to choose which events to hook your program with.
This codeproject tutorial might help-
http://www.codeproject.com/KB/system/WilsonSystemGlobalHooks.aspx
I am making an object tracking application. I have used Emgucv 2.1.0.0
to load a video file
to a picturebox. I have also taken the video stream from a web camera.
Now, I want
to draw an unfilled square on the video stream using a mouse and then track the object enclosed
by the unfilled square as the video continues to stream.
This is what people have suggested so far:-
(1) .NET Video overlay drawing(DirectX) - but this is for C++ users, the suggester
said that there are .NET wrappers, but I had a hard time finding any.
(2) DxLogo sample
DxLogo – A sample application showing how to superimpose a logo on a data stream.
It uses a capture device for the video source, and outputs the result to a file.
Sadly, this does not use a mouse.
(3) GDI+ and mouse handling - this area I do not have a clue.
And for tracking the object in the square, I would appreciate if someone give me some research paper links to read.
Any help as to using the mouse to draw on a video is greatly appreciated.
Thank you for taking the time to read this.
Many Thanks
It sounds like you want to do image detection and / or tracking.
The EmguCV ( http://www.emgu.com/wiki/index.php/Main_Page ) library provides a good foundation for this sort of thing in .Net.
e.g. http://www.emgu.com/wiki/index.php/Tutorial#Examples
It's a pretty meaty subject with quite a few years and different branches of research associated with it so I'm not sure anyone can give the definitive guide to such things but reading up neural networks and related topics would give you a pretty good grounding in the way EmguCV and related libraries manage it.
It should be noted that systems such as EmguCV are designed to recognise predefined items within a scene (such as a licence plate number) rather than an arbitory feature within a scene.
For arbitory tracking of a given feature, a search for research papers on edge detection and the like (in combination with a library such a EmguCV) is probably a good start.
(You also may want to sneak a peek at an existing application such as http://www.pfhoe.com/ to see if it fits your needs)
Problem Overview
I am working on a game application and need to be able to implement scrollable maps in Silverlight similar to those found in Google Maps. However, I am unsure as to how to implement this effectively. The following paragraphs provide much additional detail. Any ideas or guidance is greatly appreciated!
Problem Detail
I have been working on a new MMOG (massively multi-player online game). The game will implement a coordinate (x,y) based map. Only a very small fraction (less than 0.1%) of the map will be displayed on the screen at any given time. The player should have the ability to click on the map and drag the mouse to scroll and view map areas which are not presently visible. (This is somewhat similar to Google Maps.)
The map background is made up of a series of stitched (repeating) images. These images are woven together to give the basic appearance of the game's "world". A standard set of additional graphics are then superimposed, as appropriate, on each of the coordinate locations . For example, point (0,0) might be a lake, (0,1) might be a city, and (0,2) might be a forest. The respective images for a lake, a city, and a forest would be superimposed on the background.
It is important to mention that the entire map is NOT stored on the local client machine. Rather, as a player scrolls to or opens a specific location, the appropriate map information is retrieved from the remote game server. It is infeasible for us to build the entire game world map ahead of time due to its size and the fact that portions of the map are constantly changing.
I have toyed with the idea of building a bitmap on-the-fly of the new map each time a player moves. However, I think there may be a much better way to add to the map as the player scrolls.
When scrolling, movement of the map should not, if possible, result in a "flickered" refresh of the screen. I believe recreating a bitmap each and every time a player moves even one or two pixels would almost certainly result in flicker.
I am open to 3rd party tools and solutions. However, to the degree possible, I would prefer to use standard Microsoft libraries or open source tools rather than commercial tools.
What are some ideas as to the best way to implement this functionality so that it performs well, is reliable, and transitions to new areas of the map appear seamless to the player?
Thank you in advance for all your help!
Update
Here are a few pieces of additional information that may prove helpful.
Since my initial post, I have been introduced to the concept of a "tile engine". (Many thanks to Michael and Paul for pointing me towards Bing and BruTile.)
My understanding is that a tile engine basically breaks larger images into sections and renders them side by side. As a user scrolls, additional tiles are rendered as others are removed from view. This is very much what I am looking for.
However, there may be a couple of wrinkles that affect my use of a standard tile engine. All of the graphics for the game, including the backgrounds which would be displayed on any tile, will already be downloaded on the client. It is important that the tile engine not retrieve the graphics from a server as this would consume significant unnecessary bandwidth.
Other graphics (e.g. a lake, forest, hill), which represent objects from the gameworld, must be superimposed when the tiles are rendered on the screen. Tile engines such as Bing appear to provide the ability to superimpose custom images. Whatever tile engine is used must not only support this feature but allow exact placement of these superimposed images.
Finally, there is a a requirement to support popup descriptions when the user mouses over one of the superimposed graphics. Unlike the graphics which are already stored on the client, the descriptions contain information which must be downloaded from the game server. BruTile, while excellent in many ways, does not appear to yet support these popup descriptions.
We are making great progress. Thanks for all your help so far!
For an open source solution you could look at BruTile. It too has all the features you describe. It can also be used on the Microsoft Surface and on Windows Phone (for your markeplace version).
Use the Bing Maps control or the MultiScaleImage (Deep Zoom) which it uses.
To seen an example, go here. You can use the Deep Zoom Composer to create maps or topologies using your own photos and images.
Here is the SDK for the control.