I am trying to use OpenCV for hand gesture recognition in my Unity ARCore game. However, with the deprecation of TextureReaderAPI, the only way to capture the image from the camera is by using Frame.CameraImage.AcquireCameraImageBytes(). The problem with that is not only that the image is in 640x480 resolution (this cannot be changed AFAIK), but it is also in YUV_420_888 format.
As if that were not enough, OpenCV does not have free C#/Unity packages, so if I do not want to cash out 20$ for a paid package, I need to use available C++ or python versions. How do I move the YUV image to OpenCV, convert it to an RGB (or HSV) color space, and then either do some processing on it or return it back to Unity?
In this example, I will use C++ OpenCV libraries and Visual Studio 2017 and I will try to capture ARCore camera image, move it to OpenCV (as efficiently as possible), convert it to RGB color space, then move it back to Unity C# code and save it in the phone's memory.
Firstly, we have to create a C++ dynamic library project to use with OpenCV. For this, I highly recommend to follow both Pierre Baret's and Ninjaman494's answers on this question: OpenCV + Android + Unity. The process is rather straightforward, and if you will not deviate from their answers too much (i.e. you can safely download a newer than 3.3.1 version of OpenCV, but be careful when compiling for ARM64 instead of ARM, etc.), you should be able to call a C++ function from C#.
In my experience, I had to solve two problems - firstly, if you made the project part of your C# solution instead of creating a new solution, Visual Studio will keep messing with your configuration, like trying to compile a x86 version instead of an ARM version. To save yourself the hassle, create a completely separate solution. The other problem is that some functions failed to link for me, thus throwing a undefined reference linker error (undefined reference to 'cv::error(int, std::string const&, char const*, char const*, int, to be exact). If this happens and the problem is with a function that you do not really need, just recreate the function in your code - for instance if you have problems with cv::error, add this code in the end of your .cpp file:
namespace cv {
__noreturn void error(int a, const String & b, const char * c, const char * d, int e) {
throw std::string(b);
}
}
Sure, this is ugly and dirty way to do things, so if you know how to fix the linker error, please do so and let me know.
Now, you should have a working C++ code that compiles and can be run from a Unity Android application. However, what we want is for OpenCV to not return a number, but to convert an image. So change your code to this:
.h file
extern "C" {
namespace YOUR_OWN_NAMESPACE
{
int ConvertYUV2RGBA(unsigned char *, unsigned char *, int, int);
}
}
.cpp file
extern "C" {
int YOUR_OWN_NAMESPACE::ConvertYUV2RGBA(unsigned char * inputPtr, unsigned char * outputPtr, int width, int height) {
// Create Mat objects for the YUV and RGB images. For YUV, we need a
// height*1.5 x width image, that has one 8-bit channel. We can also tell
// OpenCV to have this Mat object "encapsulate" an existing array,
// which is inputPtr.
// For RGB image, we need a height x width image, that has three 8-bit
// channels. Again, we tell OpenCV to encapsulate the outputPtr array.
// Thanks to specifying existing arrays as data sources, no copying
// or memory allocation has to be done, and the process is highly
// effective.
cv::Mat input_image(height + height / 2, width, CV_8UC1, inputPtr);
cv::Mat output_image(height, width, CV_8UC3, outputPtr);
// If any of the images has not loaded, return 1 to signal an error.
if (input_image.empty() || output_image.empty()) {
return 1;
}
// Convert the image. Now you might have seen people telling you to use
// NV21 or 420sp instead of NV12, and BGR instead of RGB. I do not
// understand why, but this was the correct conversion for me.
// If you have any problems with the color in the output image,
// they are probably caused by incorrect conversion. In that case,
// I can only recommend you the trial and error method.
cv::cvtColor(input_image, output_image, cv::COLOR_YUV2RGB_NV12);
// Now that the result is safely saved in outputPtr, we can return 0.
return 0;
}
}
Now, rebuild the solution (Ctrl + Shift + B) and copy the libProjectName.so file to Unity's Plugins/Android folder, as in the linked answer.
The next thing is to save the image from ARCore, move it to C++ code, and get it back. Let us add this inside the class in our C# script:
[DllImport("YOUR_OWN_NAMESPACE")]
public static extern int ConvertYUV2RGBA(IntPtr input, IntPtr output, int width, int height);
You will be prompted by Visual Studio to add System.Runtime.InteropServices using clause - do so.
This allows us to use the C++ function in our C# code. Now, let's add this function to our C# component:
public Texture2D CameraToTexture()
{
// Create the object for the result - this has to be done before the
// using {} clause.
Texture2D result;
// Use using to make sure that C# disposes of the CameraImageBytes afterwards
using (CameraImageBytes camBytes = Frame.CameraImage.AcquireCameraImageBytes())
{
// If acquiring failed, return null
if (!camBytes.IsAvailable)
{
Debug.LogWarning("camBytes not available");
return null;
}
// To save a YUV_420_888 image, you need 1.5*pixelCount bytes.
// I will explain later, why.
byte[] YUVimage = new byte[(int)(camBytes.Width * camBytes.Height * 1.5f)];
// As CameraImageBytes keep the Y, U and V data in three separate
// arrays, we need to put them in a single array. This is done using
// native pointers, which are considered unsafe in C#.
unsafe
{
for (int i = 0; i < camBytes.Width * camBytes.Height; i++)
{
YUVimage[i] = *((byte*)camBytes.Y.ToPointer() + (i * sizeof(byte)));
}
for (int i = 0; i < camBytes.Width * camBytes.Height / 4; i++)
{
YUVimage[(camBytes.Width * camBytes.Height) + 2 * i] = *((byte*)camBytes.U.ToPointer() + (i * camBytes.UVPixelStride * sizeof(byte)));
YUVimage[(camBytes.Width * camBytes.Height) + 2 * i + 1] = *((byte*)camBytes.V.ToPointer() + (i * camBytes.UVPixelStride * sizeof(byte)));
}
}
// Create the output byte array. RGB is three channels, therefore
// we need 3 times the pixel count
byte[] RGBimage = new byte[camBytes.Width * camBytes.Height * 3];
// GCHandles help us "pin" the arrays in the memory, so that we can
// pass them to the C++ code.
GCHandle YUVhandle = GCHandle.Alloc(YUVimage, GCHandleType.Pinned);
GCHandle RGBhandle = GCHandle.Alloc(RGBimage, GCHandleType.Pinned);
// Call the C++ function that we created.
int k = ConvertYUV2RGBA(YUVhandle.AddrOfPinnedObject(), RGBhandle.AddrOfPinnedObject(), camBytes.Width, camBytes.Height);
// If OpenCV conversion failed, return null
if (k != 0)
{
Debug.LogWarning("Color conversion - k != 0");
return null;
}
// Create a new texture object
result = new Texture2D(camBytes.Width, camBytes.Height, TextureFormat.RGB24, false);
// Load the RGB array to the texture, send it to GPU
result.LoadRawTextureData(RGBimage);
result.Apply();
// Save the texture as an PNG file. End the using {} clause to
// dispose of the CameraImageBytes.
File.WriteAllBytes(Application.persistentDataPath + "/tex.png", result.EncodeToPNG());
}
// Return the texture.
return result;
}
To be able to run unsafe code, you also need to allow it in Unity. Go to Player Settings (Edit > Project Settings > Player Settings and check the Allow unsafe code checkbox.)
Now, you can call the CameraToTexture() function, let's say, every 5 seconds from Update(), and the camera image should be saved as /Android/data/YOUR_APPLICATION_PACKAGE/files/tex.png. The image will probably be landscape oriented, even if you held the phone in portrait mode, but this is not that hard to fix anymore. Also, you might notice a freeze everytime the image is saved - I recommend calling this function in a separate thread because of this. Also, the most demanding operation here is saving the image as an PNG file, so if you need it for any other reason, you should be fine (still use the separate thread, though).
If you want to undestand the YUV_420_888 format, why you need a 1.5*pixelCount array, and why we modified the arrays the way we did, read https://wiki.videolan.org/YUV/#NV12. Other websites seem to have incorrect information about how this format works.
Also, feel free to comment me with any issues you might have, and I will try to help with them, as well as any feedback for both the code and answer.
APPENDIX 1: According to https://docs.unity3d.com/ScriptReference/Texture2D.LoadRawTextureData.html, you should use GetRawTextureData instead of LoadRawTextureData, to prevent copying. To do this, just pin the array returned by GetRawTextureData instead of the RGBimage array (which you can remove). Also, do not forget to call result.Apply(); afterwards.
APPENDIX 2: Do not forget to call Free() on both GCHandles when you are done using them.
I figured out how to get the full resolution CPU image in Arcore 1.8.
I can now get the full camera resolution with cameraimagebytes.
put this in your class variables:
private ARCoreSession.OnChooseCameraConfigurationDelegate m_OnChoseCameraConfiguration = null;
put this in Start()
m_OnChoseCameraConfiguration = _ChooseCameraConfiguration; ARSessionManager.RegisterChooseCameraConfigurationCallback(m_OnChoseCameraConfiguration); ARSessionManager.enabled = false; ARSessionManager.enabled = true;
Add this callback to the class:
private int _ChooseCameraConfiguration(List<CameraConfig> supportedConfigurations) { return supportedConfigurations.Count - 1; }
Once you add those, you should have cameraimagebytes returning the full resolution of the camera.
For everyone who want to try this with OpencvForUnity:
public Mat getCameraImage()
{
// Use using to make sure that C# disposes of the CameraImageBytes afterwards
using (CameraImageBytes camBytes = Frame.CameraImage.AcquireCameraImageBytes())
{
// If acquiring failed, return null
if (!camBytes.IsAvailable)
{
Debug.LogWarning("camBytes not available");
return null;
}
// To save a YUV_420_888 image, you need 1.5*pixelCount bytes.
// I will explain later, why.
byte[] YUVimage = new byte[(int)(camBytes.Width * camBytes.Height * 1.5f)];
// As CameraImageBytes keep the Y, U and V data in three separate
// arrays, we need to put them in a single array. This is done using
// native pointers, which are considered unsafe in C#.
unsafe
{
for (int i = 0; i < camBytes.Width * camBytes.Height; i++)
{
YUVimage[i] = *((byte*)camBytes.Y.ToPointer() + (i * sizeof(byte)));
}
for (int i = 0; i < camBytes.Width * camBytes.Height / 4; i++)
{
YUVimage[(camBytes.Width * camBytes.Height) + 2 * i] = *((byte*)camBytes.U.ToPointer() + (i * camBytes.UVPixelStride * sizeof(byte)));
YUVimage[(camBytes.Width * camBytes.Height) + 2 * i + 1] = *((byte*)camBytes.V.ToPointer() + (i * camBytes.UVPixelStride * sizeof(byte)));
}
}
// Create the output byte array. RGB is three channels, therefore
// we need 3 times the pixel count
byte[] RGBimage = new byte[camBytes.Width * camBytes.Height * 3];
// GCHandles help us "pin" the arrays in the memory, so that we can
// pass them to the C++ code.
GCHandle pinnedArray = GCHandle.Alloc(YUVimage, GCHandleType.Pinned);
IntPtr pointer = pinnedArray.AddrOfPinnedObject();
Mat input = new Mat(camBytes.Height + camBytes.Height / 2, camBytes.Width, CvType.CV_8UC1);
Mat output = new Mat(camBytes.Height, camBytes.Width, CvType.CV_8UC3);
Utils.copyToMat(pointer, input);
Imgproc.cvtColor(input, output, Imgproc.COLOR_YUV2RGB_NV12);
pinnedArray.Free();
return output;
}
}
Here is an implementation of this which just uses the free plugin OpenCV Plus Unity. Very simple to set up and great documentation if you are familiar with OpenCV.
This implementation rotates the image properly using OpenCV, stores them into memory and upon exiting the app, saves them to file. I have tried to strip all Unity aspects from the code so that the function GetCameraImage() can be run on a separate thread.
I can confirm it works on Andoird (GS7), I presume it will work pretty universally.
using System;
using System.Collections.Generic;
using GoogleARCore;
using UnityEngine;
using OpenCvSharp;
using System.Runtime.InteropServices;
public class CamImage : MonoBehaviour
{
public static List<Mat> AllData = new List<Mat>();
public static void GetCameraImage()
{
// Use using to make sure that C# disposes of the CameraImageBytes afterwards
using (CameraImageBytes camBytes = Frame.CameraImage.AcquireCameraImageBytes())
{
// If acquiring failed, return null
if (!camBytes.IsAvailable)
{
return;
}
// To save a YUV_420_888 image, you need 1.5*pixelCount bytes.
// I will explain later, why.
byte[] YUVimage = new byte[(int)(camBytes.Width * camBytes.Height * 1.5f)];
// As CameraImageBytes keep the Y, U and V data in three separate
// arrays, we need to put them in a single array. This is done using
// native pointers, which are considered unsafe in C#.
unsafe
{
for (int i = 0; i < camBytes.Width * camBytes.Height; i++)
{
YUVimage[i] = *((byte*)camBytes.Y.ToPointer() + (i * sizeof(byte)));
}
for (int i = 0; i < camBytes.Width * camBytes.Height / 4; i++)
{
YUVimage[(camBytes.Width * camBytes.Height) + 2 * i] = *((byte*)camBytes.U.ToPointer() + (i * camBytes.UVPixelStride * sizeof(byte)));
YUVimage[(camBytes.Width * camBytes.Height) + 2 * i + 1] = *((byte*)camBytes.V.ToPointer() + (i * camBytes.UVPixelStride * sizeof(byte)));
}
}
// GCHandles help us "pin" the arrays in the memory, so that we can
// pass them to the C++ code.
GCHandle pinnedArray = GCHandle.Alloc(YUVimage, GCHandleType.Pinned);
IntPtr pointerYUV = pinnedArray.AddrOfPinnedObject();
Mat input = new Mat(camBytes.Height + camBytes.Height / 2, camBytes.Width, MatType.CV_8UC1, pointerYUV);
Mat output = new Mat(camBytes.Height, camBytes.Width, MatType.CV_8UC3);
Cv2.CvtColor(input, output, ColorConversionCodes.YUV2BGR_NV12);// YUV2RGB_NV12);
// FLIP AND TRANPOSE TO VERTICAL
Cv2.Transpose(output, output);
Cv2.Flip(output, output, FlipMode.Y);
AllData.Add(output);
pinnedArray.Free();
}
}
}
I then call ExportImages() when exiting the program to save to file.
private void ExportImages()
{
/// Write Camera intrinsics to text file
var path = Application.persistentDataPath;
StreamWriter sr = new StreamWriter(path + #"/intrinsics.txt");
sr.WriteLine(CameraIntrinsicsOutput.text);
Debug.Log(CameraIntrinsicsOutput.text);
sr.Close();
// Loop through Mat List, Add to Texture and Save.
for (var i = 0; i < CamImage.AllData.Count; i++)
{
Mat imOut = CamImage.AllData[i];
Texture2D result = Unity.MatToTexture(imOut);
result.Apply();
byte[] im = result.EncodeToJPG(100);
string fileName = "/IMG" + i + ".jpg";
File.WriteAllBytes(path + fileName, im);
string messge = "Succesfully Saved Image To " + path + "\n";
Debug.Log(messge);
Destroy(result);
}
}
Seems you already fix this.
But for anyone who want to combine AR with hand gesture recognition and tracking, try Manomotion: https://www.manomotion.com/
free SDk and work prefect in 12/2020.
Use the SDK Community Edition and Download ARFoundation version
I have been playing around with SharpDX.XAudio2 for a few days now, and while things have been largely positive (the odd software quirk here and there) the following problem has me completely stuck:
I am working in C# .NET using VS2015.
I am trying to play multiple sounds simultaneously.
To do this, I have made:
- Test.cs: Contains main method
- cSoundEngine.cs: Holds XAudio2, MasteringVoice, and sound management methods.
- VoiceChannel.cs: Holds a SourceVoice, and in future any sfx/ related data.
cSoundEngine:
List<VoiceChannel> sourceVoices;
XAudio2 engine;
MasteringVoice master;
public cSoundEngine()
{
engine = new XAudio2();
master = new MasteringVoice(engine);
sourceVoices = new List<VoiceChannel>();
}
public VoiceChannel AddAndPlaySFX(string filepath, double vol, float pan)
{
/**
* Set up and start SourceVoice
*/
NativeFileStream fileStream = new NativeFileStream(filepath, NativeFileMode.Open, NativeFileAccess.Read);
SoundStream soundStream = new SoundStream(fileStream);
SourceVoice source = new SourceVoice(engine, soundStream.Format);
AudioBuffer audioBuffer = new AudioBuffer()
{
Stream = soundStream.ToDataStream(),
AudioBytes = (int)soundStream.Length,
Flags = SharpDX.XAudio2.BufferFlags.EndOfStream
};
//Make voice wrapper
VoiceChannel voice = new VoiceChannel(source);
sourceVoices.Add(voice);
//Volume
source.SetVolume((float)vol);
//Play sound
source.SubmitSourceBuffer(audioBuffer, soundStream.DecodedPacketsInfo);
source.Start();
return voice;
}
Test.cs:
cSoundEngine engine = new cSoundEngine();
total = 6;
for (int i = 0; i < total; i++)
{
string filepath = System.IO.Directory.GetParent(System.IO.Directory.GetCurrentDirectory()).Parent.FullName + #"\Assets\Planet.wav";
VoiceChannel sfx = engine.AddAndPlaySFX(filepath, 0.1, 0);
}
Console.Read(); //Input anything to end play.
There is currently nothing worth showing in VoiceChannel.cs - it holds 'SourceVoice source' which is the one parameter sent in the constructor!
Everything is fine and well running with up to 5 sounds (total = 5). All you hear is the blissful drone of Planet.wav. Any higher than 5 however causes the console to freeze for ~5 seconds, then close (likely a c++ error which debugger can't handle). Sadly no error message for us to look at or anything.
From testing:
- Will not crash as long as you do not have more than 5 running sourcevoices.
- Changing sample rate does not seem to help.
- Setting inputChannels for master object to a different number makes no difference.
- MasteringVoice seems to say the max number of inputvoices is 64.
- Making each sfx play from a different wav file makes no difference.
- Setting the volume for sourcevoices and/or master makes no difference.
From the XAudio2 API Documentation I found this quote: 'XAudio2 removes the 6-channel limit on multichannel sounds, and supports multichannel audio on any multichannel-capable audio card. The card does not need to be hardware-accelerated.'. This is the closest I have come to finding something that mentions this problem.
I am not well experienced with programming sfx and a lot of this is very new to me, so feel free to call me an idiot where appropriate but please try and explain things in layman terms.
Please, if you have any ideas or answers they would be greatly appreciated!
-Josh
As Chuck has suggested, I have created a databank which holds the .wav data, and I just reference the single data store with each buffer. This has improved the sound limit up to 20 - however this has not fixed the problem as a whole, likely because I have not implemented this properly.
Implementation:
class SoundDataBank
{
/**
* Holds a single byte array for each sound
*/
Dictionary<eSFX, Byte[]> bank;
string curdir => Directory.GetParent(Directory.GetCurrentDirectory()).Parent.FullName;
public SoundDataBank()
{
bank = new Dictionary<eSFX, byte[]>();
bank.Add(eSFX.planet, NativeFile.ReadAllBytes(curdir + #"\Assets\Planet.wav"));
bank.Add(eSFX.base1, NativeFile.ReadAllBytes(curdir + #"\Assets\Base.wav"));
}
public Byte[] GetSoundData(eSFX sfx)
{
byte[] output = bank[sfx];
return output;
}
}
In SoundEngine we create a SoundBank object (initialised in SoundEngine constructor):
SoundDataBank soundBank;
public VoiceChannel AddAndPlaySFXFromStore(eSFX sfx, double vol)
{
/**
* sourcevoice will be automatically added to MasteringVoice and engine in the constructor.
*/
byte[] buffer = soundBank.GetSoundData(sfx);
MemoryStream memoryStream = new MemoryStream(buffer);
SoundStream soundStream = new SoundStream(memoryStream);
SourceVoice source = new SourceVoice(engine, soundStream.Format);
AudioBuffer audioBuffer = new AudioBuffer()
{
Stream = soundStream.ToDataStream(),
AudioBytes = (int)soundStream.Length,
Flags = SharpDX.XAudio2.BufferFlags.EndOfStream
};
//Make voice wrapper
VoiceChannel voice = new VoiceChannel(source, engine, MakeOutputMatrix());
//Volume
source.SetVolume((float)vol);
//Play sound
source.SubmitSourceBuffer(audioBuffer, soundStream.DecodedPacketsInfo);
source.Start();
sourceVoices.Add(voice);
return voice;
}
Following this implementation now lets me play up to 20 sound effects - but NOT because we are playing from the soundbank. Infact, even running the old method for sound effects now gets up to 20 sfx instances.
This has improved up to 20 because we have done NativeFile.ReadAllBytes(curdir + #"\Assets\Base.wav") in the constructor for the SoundBank.
I suspect NativeFile is holding a store of loaded file data, so you regardless of whether you run the original SoundEngine.AddAndPlaySFX() or SoundEngine.AddAndPlaySFXFromStore(), they are both running from memory?
Either way, this has quadrupled the limit from before, so this has been incredibly useful - but requires further work.
I have been trying to find a way to write to a MIDI file using the C# MIDI Toolkit. However, I am constantly running into problems with time synchronization. The resulting MIDI file is always off beat. More precisely, it has correct rhythm in relation to itself, but when imported into a sequencer, it does not seem to contain any tempo information (which is understandable, since I never specify it from within my program). There is no documentation on how to do this.
I am using the following code to insert notes into a track.
public const int NOTE_LENGTH = 32;
private static void InsertNote(Track t, int pitch, int velocity, int position, int duration, int channel)
{
ChannelMessageBuilder builder = new ChannelMessageBuilder();
builder.Command = ChannelCommand.NoteOn;
builder.Data1 = pitch;
builder.Data2 = velocity;
builder.MidiChannel = channel;
builder.Build();
t.Insert(position * NOTE_LENGTH, builder.Result);
builder.Command = ChannelCommand.NoteOff;
builder.Build();
t.Insert((position + duration) * NOTE_LENGTH, builder.Result);
}
I am sure the notes themselves are okay, since the resulting output is audible, but has no tempo information. How do I enter tempo information into the Sequence object that contains my tracks?
Stumbled upon an answer by brute-force trying: the NOTE_LENGTH should be evenly devisable by 3.
I'm making some changes for Open Hardware Monitor. I will add the network adapter download and upload speed. But when I calculate the download speed I get a wrong calculation.
I can't use a timer to calculate the correct download speed because of the auto update in OHM.
In the source here you can see how I calculate the download speed (in Mb/s).
In the construct of the class i do:
IPv4InterfaceStatistics interfaceStats = netInterfaces.GetIPv4Statistics();
bytesSent = interfaceStats.BytesSent;
bytesReceived = interfaceStats.BytesReceived;
stopWatch = new Stopwatch();
stopWatch.Start();
When the update method is called (in some random times) I do this:
IPv4InterfaceStatistics interfaceStats = netInterfaces.GetIPv4Statistics();
stopWatch.Stop();
long time = stopWatch.ElapsedMilliseconds;
if (time != 0)
{
long bytes = interfaceStats.BytesSent;
long bytesCalc = ((bytes - bytesSent)*8);
usedDownloadSpeed.Value = ((bytesCalc / time) * 1000)/1024;
bytesSent = bytes;
}
Hope someone can see my issue?
Added screenshot
There where a few conversion issues with my previous code.
Now I have this source and it works.
Tnx all for answering.
interfaceStats = netInterfaces.GetIPv4Statistics();
//Calculate download speed
downloadSpeed.Value = Convert.ToInt32(interfaceStats.BytesReceived - bytesPreviousReceived) / 1024F;
bytesPreviousReceived = interfaceStats.BytesReceived;
The following changes should help...
speed = netInterfaces.Speed / 1048576L;
If I recall correctly, the Speed property is a long and when you divide it by an int, you end up with a truncated result. Which brings us to a similar set of changes in your other calculation...
usedDownloadSpeed.Value = ((bytesCalc / time) * 1000L)/1024L;
... assuming that usedDownloadSpeed.Value is also a long to make sure you're not getting any truncated values with implicit conversion of your results or calculations. If you want to be doubly sure you have the casting correctly, use Convert.ToInt64().