I'm implementing a little FaceRecognition program using Emgu as a wrapper of OpenCV libraries. It seems to work fine, but I need a function that returns all the distances between the image sample and the faces in the database (the FaceRecognizer.Predict method implemented only returns the smallest distance and label).
So I built Emgu from Git, in order to adapt functions in the unmanaged code (cvextern.dll) to my needs.
Here's the original in face_c.cpp
void cveFaceRecognizerPredict(cv::face::FaceRecognizer* recognizer, cv::_InputArray* image, int* label, double* dist)
{
int l = -1;
double d = -1;
recognizer->predict(*image, l, d);
*label = l;
*dist = d;
}
that stores minimum distance and corresponding label in l and d, thanks to predict.
The method I wrote, following the summary in opencv face.hpp:
void cveFaceRecognizerPredictCollector(cv::face::FaceRecognizer * recognizer, cv::_InputArray * image, std::vector<int>* labels, std::vector<double>* distances)
{
std::map<int, double> result_map = std::map<int, double>();
cv::Ptr<cv::face::StandardCollector> collector = cv::face::StandardCollector::create();
recognizer->predict(*image, collector);
result_map = collector->getResultsMap();
for (std::map<int, double>::iterator it = result_map.begin(); it != result_map.end(); ++it) {
distances->push_back(it->second);
labels->push_back(it->first);
}
}
And the caller in c#
using (Emgu.CV.Util.VectorOfInt labels = new Emgu.CV.Util.VectorOfInt())
using (Emgu.CV.Util.VectorOfDouble distances = new Emgu.CV.Util.VectorOfDouble())
using (InputArray iaImage = image.GetInputArray())
{
FaceInvoke.cveFaceRecognizerPredictCollector(_ptr, iaImage, labels, distances);
}
[DllImport(CvInvoke.ExternLibrary, CallingConvention = CvInvoke.CvCallingConvention)]
internal extern static void cveFaceRecognizerPredictCollector(IntPtr recognizer, IntPtr image, IntPtr labels, IntPtr distances);
The application works in real-time, so the function in c# is called continuously. I have only two faces and one label (same person) stored in my database, so the first call returns correctly the only possible label and stores it in labels. Keeping the application running, returned labels and the size of labels vector keep growing, filled with unregistered labels that I don't know where he takes. It seems to me like the collector in c++ is not well referenced, so that every time the function is called it keeps storing data without releasing the previous ones, overwriting them. But it's only my guess, I'm not very good with c++.
What else could possily be wrong?
Hope you can help
Related
I have an openCv based dll, that connects to a camera. I then call a cv::mat object into a C# application, and display the image as a Bitmap in a picturebox object.
This works, but the image occasionally 'glitches', showing flashes of lines, static and pops, every few seconds.
Is there a way I can check if the bitmap is valid before displaying it?
When i show the image in the dll, using cv::imshow, it looks fine.
The code I have is:
in the c++ dll:
__declspec(dllexport) uchar* getArucoFrame(void)
{
cv::Mat OriginalImg = returnLeftFrame(); // calls the frame from where the camera thread stores it.
cv::Mat tmp;
cv::cvtColor(OriginalImg, tmp, CV_BGRA2BGR);
//if I cv::imshow the Mat here, it looks good.
return tmp.data;
}
in the C# side:
//on a button
threadImageShow = new Thread(imageShow);
threadImageShow.Start();
//show image frame in box
private void imageShow()
{
while(true)
{
IntPtr ptr = getArucoFrame();
if (pictureBoxFrame.Image != null)
{
pictureBoxFrame.Image.Dispose();
}
Bitmap a = new Bitmap(640, 360, 3 * 640, PixelFormat.Format24bppRgb, ptr);
pictureBoxFrame.Image = a;
Thread.Sleep(20);
}
}
//the dll call
[DllImport("Vector.dll", CallingConvention = CallingConvention.Cdecl)]
public static extern IntPtr getArucoFrame();
As the image looks good in the dll, and glitchy in the picturebox, i am having trouble debugging this. Any help much appreciated. thank you.
The problem you have here is that you pass pointer to data of temporary image cv::Mat tmp; into C#, but it gets freed on exit of getArucoFrame(void), thus it is dangling pointer. It may work, but seems it sometimes gets overwritten by new data. An easiest, but not the most optimal fix would be declaring it static static cv::Mat tmp; so it gets freed on DLL unload.
I'm currently working on a deep reinforcement learning implementation. To see how the training progresses, I created the UI seen below. The textbox and both charts are update each time at the end of a while-loop. This loops is run inside a thread, which simulates a slot machine and trains a neural network. The performance profiler indicates that 87% of CPU usage are consumed by the main thread (running the UI) and the rest is left for the simulation thread.
Does anybody know of a good approach to dramatically shrink down the cost of the UI?
private delegate void AppendChartCallback(Chart chart, double x, double y);
private void AppendChart(Chart chart, double x, double y)
{
if (chart.InvokeRequired)
{
AppendChartCallback d = new AppendChartCallback(AppendChart);
Invoke(d, new object[] { chart, x, y });
}
else
{
chart.Series[0].Points.AddXY(x, y);
if (chart.Series[0].Points.Count % 20 == 0)
{
chart.Refresh();
}
}
}
edit: I suspended the charts' updates and call individually refresh now as soon as some more amount of data is added (based on modulo).
I would not plot individual (x,y) points, you can bind to an array of values. There is an example here How To Create A Line Chart From Array Of Values?
Add the points to a list.
Have a timer invalidate the view every 16.66 ms.
I have a compute shader and the C# script which goes with it used to modify an array of vertices on the y axis simple enough to be clear.
But despite the fact that it runs fine the shader seems to forget the first vertex of my shape (except when that shape is a closed volume?)
Here is the C# class :
Mesh m;
//public bool stopProcess = false; //Useless in this version of exemple
MeshCollider coll;
public ComputeShader csFile; //the compute shader file added the Unity way
Vector3[] arrayToProcess; //An array of vectors i'll use to store data
ComputeBuffer cbf; //the buffer CPU->GPU (An early version with exactly
//the same result had only this one)
ComputeBuffer cbfOut; //the Buffer GPU->CPU
int vertexLength;
void Awake() { //Assigning my stuff
coll = gameObject.GetComponent<MeshCollider>();
m = GetComponent<MeshFilter>().sharedMesh;
vertexLength = m.vertices.Length;
arrayToProcess = m.vertices; //setting the first version of the vertex array (copy of mesh)
}
void Start () {
cbf = new ComputeBuffer(vertexLength,32); //Buffer in
cbfOut = new ComputeBuffer(vertexLength,32); //Buffer out
csFile.SetBuffer(0,"Board",cbf);
csFile.SetBuffer(0,"BoardOut",cbfOut);
}
void Update () {
csFile.SetFloat("time",Time.time);
cbf.SetData(m.vertices);
csFile.Dispatch(0,vertexLength,vertexLength,1); //Dispatching (i think there is my mistake)
cbfOut.GetData(arrayToProcess); //getting back my processed vertices
m.vertices = arrayToProcess; //assigning them to the mesh
//coll.sharedMesh = m; //collider stuff useless in this demo
}
And my compute shader script :
#pragma kernel CSMain
RWStructuredBuffer<float3> Board : register(s[0]);
RWStructuredBuffer<float3> BoardOut : register(s[1]);
float time;
[numthreads(1,1,1)]
void CSMain (uint3 id : SV_DispatchThreadID)
{
float valx = (sin((time*4)+Board[id.x].x));
float valz = (cos((time*2)+Board[id.x].z));
Board[id.x].y = (valx + valz)/5;
BoardOut[id.x] = Board[id.x];
}
At the beginning I was reading and writing from the same buffer, but as I had my issue I tried having separate buffers, but with no success. I still have the same problem.
Maybe I misunderstood the way compute shaders are supposed to be used (and I know I could use a vertex shader but I just want to try compute shaders for further improvements.)
To complete what I said, I suppose it is related with the way vertices are indexed in the Mesh.vertices Array.
I tried a LOT of different Blocks/Threads configuration but nothing seems to solve the issue combinations tried :
Block Thread
60,60,1 1,1,1
1,1,1 60,60,3
10,10,3 3,1,1
and some others I do not remember. I think the best configuration should be something with a good balance like :
Block : VertexCount,1,1 Thread : 3,1,1
About the closed volume: I'm not sure about that because with a Cube {8 Vertices} everything seems to move accordingly, but with a shape with an odd number of vertices, the first (or last did not checked that yet) seems to not be processed
I tried it with many different shapes but subdivided planes are the most obvious, one corner is always not moving.
EDIT :
After further study i found out that it is simply the compute shader which does not compute the last (not the first i checked) vertices of the mesh, it seems related to the buffer type, i still dont get why RWStructuredBuffer should be an issue or how badly i use it, is it reserved to streams? i cant understand the MSDN doc on this one.
EDIT : After resolution
The C# script :
using UnityEngine;
using System.Collections;
public class TreeObject : MonoBehaviour {
Mesh m;
public bool stopProcess = false;
MeshCollider coll;
public ComputeShader csFile;
Vector3[] arrayToProcess;
ComputeBuffer cbf;
ComputeBuffer cbfOut;
int vertexLength;
// Use this for initialization
void Awake() {
coll = gameObject.GetComponent<MeshCollider>();
m = GetComponent<MeshFilter>().mesh;
vertexLength = m.vertices.Length+3; //I add 3 because apparently
//vertexnumber is odd
//arrayToProcess = new Vector3[vertexLength];
arrayToProcess = m.vertices;
}
void Start () {
cbf = new ComputeBuffer(vertexLength,12);
cbfOut = new ComputeBuffer(vertexLength,12);
csFile.SetBuffer(0,"Board",cbf);
csFile.SetBuffer(0,"BoardOut",cbfOut);
}
// Update is called once per frame
void Update () {
csFile.SetFloat("time",Time.time);
cbf.SetData(m.vertices);
csFile.Dispatch(0,vertexLength,1,1);
cbfOut.GetData(arrayToProcess);
m.vertices = arrayToProcess;
coll.sharedMesh = m;
}
}
I had already rolled back to a
Blocks VCount,1,1
Before your answer because it was logic that i was using VCount*VCount so processing the vertices "square-more" times than needed.
To complete, you were absolutely right the Stride was obviously giving issues could you complete your answer with a link to doc about the stride parameter? (from anywhere because Unity docs are VOID and MSDN did not helped me to get why it should be 12 and not 32 (as i thought 32 was the size of a float3)
so Doc needed please
In the mean time i'll try to provide a flexible enough (generic?) version of this to make it stronger, and start adding some nice array processing functions in my shader...
I'm familiar with Compute Shaders but have never touched Unity, but having looked over the documentation for Compute Shaders in Unity a couple of things stand out.
The cbf and cbfOut ComputeBuffers are created with a stride of 32 (bytes?). Both your StructuredBuffers contain float3s which have a stride of 12 bytes, not 32. Where has 32 come from?
When you dispatch your compute shader you're requesting a two-dimensional dispatch (vertexLength,vertexLength, 1) but you're operating on a 1D array of float3s. You will end up with a race condition where many different threads think they're responsible for updating each element of the array. Although awful for performance, if you want a thread group size of [numthreads(1,1,1)] then you should dispatch (vertexLength, 1, 1) numbers of waves/wavefronts when calling Dispatch (ie, Dispatch (60,1,1) with numThreads(1,1,1)).
For best/better performance the number of threads in your thread group / wave should at least be a multiple of 64 for best efficiency on AMD hardware. You then need only dispatch ceil(numVertices/64) wavefronts and then simply insert some logic into the shader to ensure id.x is not out of bounds for any given thread.
EDIT:
The documentation for the ComputeBuffer constructor is here: Unity ComputeBuffer Documentation
While it doesn't explicitly say "stride" is in bytes, it's the only reasonable assumption.
I have a working implementation of NAudio's wasapi loopback recording and the FFT of the data.
Most of the data I get is just as it should be but every once in a while (10 sec to minutes intervals) it shows amplitude on almost all frequencies.
Basicly the picture is rolling from right to left with time and frequencies going on logarithmic scale from lowest frequencies on the bottom. The lines are the errors. As far as i can tell those are not supposed to be there.
I get the audio buffer and send the samples to an aggregator (applies Hamming window) which implements the NAudio FFT. I have checked the data (FFT result) before I modify it in any way (the picture is not from the raw FFT output, but desibel scaled) confirming the FFT result is giving those lines. I could also point out the picture is modified with LockBits so I thought I had something wrong with the logic there, but that's why I checked the FFT output data which shows the same problem.
Well I could be wrong and the problem might be somewhere I said it isn't but it really seems it originates from the FFT OR the buffer data (data itself or the aggregation of samples). Somehow I doubt the buffer itself is corrupted like this.
If anyone has any idea what could cause this I would greatly appreciate it!
UPDATE
So I decided to draw the whole FFT result range rather than half of it. It showed something strange. I'm not sure of FFT but I thought Fourier transformation should give a result that is mirrored around the middle. This certainly is not the case here.
The picture is in linear scale so the exact middle of the picture is the middle point of the FFT result. Bottom is the first and top is the last.
I was playing a 10kHz sine wave which gives the two horizontal lines there but the top part is beyond me. It also seems like the lines are mirrored around the bottom quarter of the picture so that seems strange to me as well.
UPDATE 2
So I increased the FFT size from 4096 to 8192 and tried again. This is the output with me messing with the sine frequency.
It would seem the result is mirrored twice. Once in the middle and then again on the top and bottom halves. And the huge lines are now gone.. And it would seem like the lines only appear on the bottom half now.
After some further testing with different FFT lengths it seems the lines are completely random in that account.
UPDATE 3
I have done some testing with many things. The latest thing I added was overlapping of samples so that I reuse the last half of the sample array in the beginning of the next FFT. On Hamming and Hann windows it gives me massive intensities (quite like in the second picture I posted) but not with BlackmannHarris. Disabling overlapping removes the biggest errors on every window function. The smaller errors like in the top picture still remain even with BH window. I still have no idea why those lines appear.
My current form allows control over which window function to use (of the three previously mentioned), overlapping (on/off) and multiple different drawing options. This allows me to compare all the affecting parties effects when changed.
I shall investigate further (I am quite sure I have made a mistake at some point) but good suggestions are more than welcome!
The problem was in the way I handled the data arrays. Working like a charm now.
Code (removed excess and might have added mistakes):
// Other inputs are also usable. Just look through the NAudio library.
private IWaveIn waveIn;
private static int fftLength = 8192; // NAudio fft wants powers of two!
// There might be a sample aggregator in NAudio somewhere but I made a variation for my needs
private SampleAggregator sampleAggregator = new SampleAggregator(fftLength);
public Main()
{
sampleAggregator.FftCalculated += new EventHandler<FftEventArgs>(FftCalculated);
sampleAggregator.PerformFFT = true;
// Here you decide what you want to use as the waveIn.
// There are many options in NAudio and you can use other streams/files.
// Note that the code varies for each different source.
waveIn = new WasapiLoopbackCapture();
waveIn.DataAvailable += OnDataAvailable;
waveIn.StartRecording();
}
void OnDataAvailable(object sender, WaveInEventArgs e)
{
if (this.InvokeRequired)
{
this.BeginInvoke(new EventHandler<WaveInEventArgs>(OnDataAvailable), sender, e);
}
else
{
byte[] buffer = e.Buffer;
int bytesRecorded = e.BytesRecorded;
int bufferIncrement = waveIn.WaveFormat.BlockAlign;
for (int index = 0; index < bytesRecorded; index += bufferIncrement)
{
float sample32 = BitConverter.ToSingle(buffer, index);
sampleAggregator.Add(sample32);
}
}
}
void FftCalculated(object sender, FftEventArgs e)
{
// Do something with e.result!
}
And the Sample Aggregator class:
using NAudio.Dsp; // The Complex and FFT are here!
class SampleAggregator
{
// FFT
public event EventHandler<FftEventArgs> FftCalculated;
public bool PerformFFT { get; set; }
// This Complex is NAudio's own!
private Complex[] fftBuffer;
private FftEventArgs fftArgs;
private int fftPos;
private int fftLength;
private int m;
public SampleAggregator(int fftLength)
{
if (!IsPowerOfTwo(fftLength))
{
throw new ArgumentException("FFT Length must be a power of two");
}
this.m = (int)Math.Log(fftLength, 2.0);
this.fftLength = fftLength;
this.fftBuffer = new Complex[fftLength];
this.fftArgs = new FftEventArgs(fftBuffer);
}
bool IsPowerOfTwo(int x)
{
return (x & (x - 1)) == 0;
}
public void Add(float value)
{
if (PerformFFT && FftCalculated != null)
{
// Remember the window function! There are many others as well.
fftBuffer[fftPos].X = (float)(value * FastFourierTransform.HammingWindow(fftPos, fftLength));
fftBuffer[fftPos].Y = 0; // This is always zero with audio.
fftPos++;
if (fftPos >= fftLength)
{
fftPos = 0;
FastFourierTransform.FFT(true, m, fftBuffer);
FftCalculated(this, fftArgs);
}
}
}
}
public class FftEventArgs : EventArgs
{
[DebuggerStepThrough]
public FftEventArgs(Complex[] result)
{
this.Result = result;
}
public Complex[] Result { get; private set; }
}
And that is it I think. I might have missed something though.
Hope this helps!
I am trying to programm an edge detection method. And I have used emgucv Image class. Since I need gray values I have declared it as
Image<Gray,float> MyImage = new Image<Gray,float>;
I select an image and assign its pixel values into MyImage as
public void selectImage()
{
OpenFileDialog opp = new OpenFileDialog();
if (opp.ShowDialog() == DialogResult.OK)
{
MyImage = new Image<Gray,float>(opp.FileName);
InputArray = new Image<Gray, float>(opp.FileName);
Convert.ToString(MyImage);
pictureBox1.Image = MyImage.ToBitmap();
}
}
When I click on edge detection button it calls the main recursive function
private void detect_edges_Click(object sender, EventArgs e)
{
hueckel_operator(1, 1);
}
This operator repeats itself with 5pixel intervals. In other words I apply it on x axis by incrementing x parameter by 5 and at the end of the row I increment y axis by 5 and so on.
In hueckel_operator, the function "a()" which calculates again a very heavy formula is being called 8 times. Here is the a() function
public double a(int j, int counter6, int counter7)
{
for (int II = 0; II <= j ; II++)
{
for (KK = 1; KK < 70; KK++)
{
x_value = input_i_x(KK); //this function brings the x coordinate
y_value = input_i_y(KK); // this function brings the y coordinate
result += HueckelDisk(x_value,y_value,j) * MyImage[x_value+counter6, y_value+counter7].Intensity;
//MyImage.Dispose();
}
}
return result;
}
But the problem is approximately at the coordinate (75,5) it throws a stack overflow exception. I debugged it with performance analyse and MyImage seems to eat all memory. You probably wants to see recursive function but since it is too big I can not put it here and I am sure that recursive function (hueckel_operator()) can not reach terminating condition since I found out how many times it has been called. What I want is to find out if there is another way to calculate "result" in a more efficient way.
My other question is that, object MyImage is used in function a() 69*j times and does it mean that it allocates memory space 69*j times whenever a() is called?
During my desperate tries, I have declared and defined almost every variable as global in order to reduce the memory usage because I have thought otherwise whenever hueckel_operator() and a() are called, local variables would allocate extra memory in stack over and over, is it a good or necessary approach?
I use 4 very nested and heavy functions and I don't use any class. Would it be the main problem? To be honest I don't see anything here to convert into class.
I know, I have asked too many questions but I am really desperate right now. I am reading articles since some weeks and I guess I need a kick start. Any help would be appreciated.
A stack overflow exception doesn't really have much to do with memory usage - it's from stack usage. In your case, a recursive call.
Recursion can only go so deep until the stack is exhausted. Once that happens, you get your stack overflow.
If you are not in an unbounded recursion but you need to go deeper, you can specify the stack size when you create a thread, and run your recursive function on that:
var stackSize = 10000000;
var thread = new Thread(new ThreadStart(StartDetection), stackSize);
However, the default size is 1MB - which is quite a bit for most tasks. You may want to verify your recursion is in fact not unbound, or that you can't reduce or remove it.