I'm writing a program which uses OpenCv neural networks module along with C# and OpenCvSharp library. It must recognise the face of user, so in order to train the network, i need a set of samples. The problem is how to convert a sample image into array suitable for training. What i've got is 200x200 BitMap image, and network with 40000 input neurons, 200 hidden neurons and one output:
CvMat layerSizes = Cv.CreateMat(3, 1, MatrixType.S32C1);
layerSizes[0, 0] = 40000;
layerSizes[1, 0] = 200;
layerSizes[2, 0] = 1;
Network = new CvANN_MLP(layerSizes,MLPActivationFunc.SigmoidSym,0.6,1);
So then I'm trying to convert BitMap image into CvMat array:
private void getTrainingMat(int cell_count, CvMat trainMAt, CvMat responses)
{
CvMat res = Cv.CreateMat(cell_count, 10, MatrixType.F32C1);//10 is a number of samples
responses = Cv.CreateMat(10, 1, MatrixType.F32C1);//array of supposed outputs
int counter = 0;
foreach (Bitmap b in trainSet)
{
IplImage img = BitmapConverter.ToIplImage(b);
Mat imgMat = new Mat(img);
for (int i=0;i<imgMat.Height;i++)
{
for (int j = 0; j < imgMat.Width; j++)
{
int val =imgMat.Get<int>(i, j);
res[counter, 0] = imgMat.Get<int>(i, j);
}
responses[i, 0] = 1;
}
trainMAt = res;
}
}
And then, when trying to train it, I've got this exception:
input training data should be a floating-point matrix withthe number of rows equal to the number of training samples and the number of columns equal to the size of 0-th (input) layer
Code for training:
trainMAt = Cv.CreateMat(inp_layer_size, 10, MatrixType.F32C1);
responses = Cv.CreateMat(inp_layer_size, 1, MatrixType.F32C1);
getTrainingMat(inp_layer_size, trainMAt, responses);
Network.Train(trainMAt, responses, new CvMat(),null, Parameters);
I'm new to OpenCV and I think I did something wrong in converting because of lack of understanding CvMat structure. Where is my error and is there any other way of transforming the bitmap?
With the number of rows equal to the number of training samples
That's 10 samples.
and the number of columns equal to the size of 0-th (input) layer
That's inp_layer_size.
trainMAt = Cv.CreateMat(10, inp_layer_size, MatrixType.F32C1);
responses = Cv.CreateMat(10, 1, MatrixType.F32C1); // 10 labels for 10 samples
I primarily do C++, so forgive me if I'm misunderstanding, but your pixel loop will need adapting in addition.
Your inner loop looks broken, as you assign to val, but never use it, and also never increment your counter.
In addition, in your outer loop assigning trainMAt = res; for every image doesn't seem like a very good idea.
I am certain you will get it to operate correctly, just keep in mind the fact that the goal is to flatten each image into a single row, so you end up with 10 rows and inp_layer_size columns.
Related
I want to use NAudio to extract the peak values of wav audio files like the audiowaveform library provides. Which is an array of short values like:
{
"version": 2,
"channels": 2,
"sample_rate": 48000,
"samples_per_pixel": 512,
"bits": 8,
"length": 3,
"data": [-65,63,-66,64,-40,41,-39,45,-55,43,-55,44]
}
https://github.com/bbc/audiowaveform/blob/master/doc/DataFormat.md
I saw that NAudio has a built-in WaveformRenderer that outputs png images, but I just need the raw peak data. Built-in NAudio classes like MaxPeakProvider (which is initialized with waveReader.ToSampleProvider()) etc works with floats, I need them as short values. Is it possible?
Edit after #Blindy's response
I have this piece of code, converting peak data to shorts, but it produces slightly wrong values when comparing with the output of audiowaveform. What might be the problem? I guess there is a data type conversion or rounding error?
var waveReader = new WaveFileReader(openFileDialog1.FileName);
var wave = new WaveChannel32(waveReader);
var peakProvider = new MaxPeakProvider();
int bytesPerSample = (waveReader.WaveFormat.BitsPerSample / 8);
var samples = waveReader.Length / bytesPerSample;
int pixelsPerPeak = 1;
int spacerPixels = 0;
var samplesPerPixel = 32;
var stepSize = pixelsPerPeak + spacerPixels;
peakProvider.Init(waveReader.ToSampleProvider(), samplesPerPixel * stepSize);
List<short> resultValuesAsShort = new List<short>();
List<float> resultValuesAsFloat= new List<float>();
PeakInfo currentPeak = null;
for (int i = 0; i < samples / samplesPerPixel; i++)
{
currentPeak = peakProvider.GetNextPeak();
resultValuesAsShort.Add((short)(currentPeak.Min * short.MaxValue));
resultValuesAsShort.Add((short)(currentPeak.Max * short.MaxValue));
//resultValuesAsFloat.Add(currentPeak.Min);
//resultValuesAsFloat.Add(currentPeak.Max);
}
Here are the results comparison:
Edit2: Examining further, I noticed that it is generating quite different results for the latter values than audiowaveform library and I don't have any idea for the reason:
MaxPeakProvider [...] works with floats, I need them as short values
NAudio in general works with floats, it's just how it was designed. To get the 2-byte short representation, multiply each float by short.MaxValue (the floats are in the [-1..1] range).
I'm trying to create a script to match the colours between two images using EmguCV.
I've managed to find code that does exactly what I want to, here, however it's written in C++ which I'm not very familiar with.
I'm sure a lot of these things are basic C++ -> C# issues even if you're not familiar with EmguCV / OpenCV...
So far I'm stumped on the following (see code below).
'mask(p)' - mask is of type Mat, and this produces an error in C#: 'a method name is expected'. I presume that the code is trying to index the mask but not sure how to do this. There are quite a few of these instances in the code.
'chns[i]' - probably similar to the above, chns is again of type Mat, and this produces the error "Cannot apply indexing with [] to an expression of type Mat".
With 'mask(p)' above, and the other various instances, once the issue above is corrected I suspect that there will be another issue in comparing the Mat with an integer to iterate through - perhaps this is looking at the columns or the rows, I'm not sure. (I'm referring to, for example 'if (mask(p) > 0)' )
With 'CvInvoke.Split(src, chns)' creates an error 'Cannot convert from 'System.Collections.Generic.List<Emgu.CV.Mat> to Emgu.CV.IOutputArray'. I assume I need to define chns and chns1 to an IOutputArray - although I'm not sure how to do this - declaring an IOutputArray using new OutputArray requires an IntPtr reference (possibly 'new Mat()'? and a parent - not quite sure what's required here. lease see below.
I've run the original C++ code through a C++ to C# converter to get rid of the obvious issues, and have made as many changes as I can to convert OpenCV calls to EmguCV, and I'm left with the below. Any help in deciphering the remaining parts would be most gratefully received.
Further assumptions that I've applied:
In referencing 'Mat' in the method names, I don't think C# requires you to specify the depth as you do in C++, so I've removed the and references, but instead updated the DepthTypes when the mats are declared, e.g. 'new Mat(1, 256, DepthType.Cv64F, 1);' (in the case of double)
Updated variable types, e.g. 'uchar' -> 'byte'
Other conversions have been annoted, apart from obvious OpenCV -> EmguCV conversions that are clearly correct.
Code I've got to so far:
public static class EmguCVColourMatchingHelper
{
// Compute histogram and CDF for an image with mask
// C++: void do1ChnHist(const Mat_<uchar> &img, const Mat_<uchar> &mask, Mat_<double> &h, Mat_<double> &cdf)
public static void do1ChnHist(Mat img, Mat mask, Mat h, Mat cdf)
{
// C++: for (size_t p = 0; p<img.total(); p++)
for (var p = 0; p < (Int32)img.Total; p++)
{
if (mask(p) > 0) // ERROR (Issue 1): 'Mat mask - Method name expected' - happens with all Mat types followed by ().
{
byte c = img(p); // ERROR (Issue 1)
h(c) += 1.0; // ERROR (Issue 1)
}
}
CvInvoke.Normalize(h, h, 1, 0, NormType.MinMax);
cdf(0) = h(0); // ERROR (Issue 1)
for (int j = 1; j < 256; j++)
{
cdf(j) = cdf(j - 1) + h(j); // ERROR (Issue 1)
}
CvInvoke.Normalize(cdf, cdf, 1, 0, NormType.MinMax);
}
public static void histMatchRGB(Mat src, Mat src_mask, Mat dst, Mat dst_mask)
{
double histmatch_epsilon = 0.000001;
// C++: vector<Mat_<uchar>> chns, chns1;
// List<Mat> chns = new List<Mat>(); - this is the main way to convert vector<Mat> chns, chns1 - however itn's not compatible with CvInvoke.Split below
// I think I need to declare an IOutputArray below, but not exactly sure how to do this.
IOutputArray chns = new OutputArray(new Mat(), something??); // Issue 4.
CvInvoke.Split(src, chns);
CvInvoke.Split(dst, chns1);
for (int i = 0; i < 3; i++)
{
// C++: Mat_<double> src_hist = Mat_<double>::zeros(1, 256); etc...
// NOTE: here I've assumed 1 channel (last '1' reference in new statements below), as I think we're iterating through RGB
Mat src_hist = new Mat(1, 256, DepthType.Cv64F, 1);
Mat dst_hist = new Mat(1, 256, DepthType.Cv64F, 1);
Mat src_cdf = new Mat(1, 256, DepthType.Cv64F, 1);
Mat dst_cdf = new Mat(1, 256, DepthType.Cv64F, 1);
do1ChnHist(chns[i], src_mask, src_hist, src_cdf); // ERROR (Issue 2): Cannot apply indexing with [] to an expression of type 'Mat'
do1ChnHist(chns1[i], dst_mask, dst_hist, dst_cdf); // ERROR(Issue 2)
byte last = 0;
Mat lut = new Mat(1, 256, DepthType.Cv8U, 1);
for (int j = 0; j < src_cdf.Cols; j++)
{
double F1j = src_cdf(j); // ERROR (Issue 1)
for (byte k = last; k < dst_cdf.Cols; k++)
{
double F2k = dst_cdf(k); // ERROR (Issue 1)
if (Math.Abs(F2k - F1j) < histmatch_epsilon || F2k > F1j)
{
lut(j) = k; // ERROR (Issue 1)
last = k;
break;
}
}
}
CvInvoke.LUT(chns[i], lut, chns[i]); //ERROR(Issue 2)
}
Mat res = new Mat();
CvInvoke.Merge(chns, res);
res.CopyTo(src);
}
internal static void Main(string[] args)
{
Mat src = CvInvoke.Imread("e:/test/iO6S1m.png");
Mat dst = CvInvoke.Imread("e:/test/kfku3m.png");
Mat mask = new Mat(src.Size, DepthType.Cv8U, 255);
histMatchRGB(dst, mask, src, mask);
}
}
Thanks for any help!
mask(p)
The compiler interprets this as a method call, but the mask is an object, so this does not work. I would assume you want to extract the element at that position. Since the Mat class does not seem to contain an indexer you might have to use GetData or GetDataPointer to either convert the matrix to a array, or use a unsafe pointer for access.
chns[i]
I would assume the intent is to extract a single channel. For this there seem to be the split method
Overall, you need to have some idea of what the code is doing, and read the documentation to find equivalent ways to do things.
Whilst using Vision OCR in c# xamarin. I found that the API returns text that is supposed to be in 1 region, in different regions. This causes unexcpected behaviour and incorrect processing of the data.
To solve this, Im required to extract the Y coordinates from the boundingboxes on the lines, save the accompanying data.
Add both to a list.
Cross reference each list entry with all the others. When two Y coordinates fall within a deviation of 10 they need to be combined.
I added a transcript below of the structure of the code. I tried using tuples, dictionaries, structs etc. But couldn't find a solution for cross referencing the values for any of them.
Does anyone have a pointer in the right direction?
UPDATE; Im making good progress using a recursive binary search combined with a tuple comparer. Il post the code if/when it works.
class Playground
{
public void meh()
{
//these are the boundingboxes of different regions when the returned value from the OCR api is parsed
// int[] arr=new int[]{ Left X, Top Y, Width, Height};
int[] arrA =new int[] { 178, 1141, 393, 91 };//item 3 xenos
int[] arrB =new int[] { 171, 1296, 216, 53 };//totaal 3items
int[] arrC =new int[] { 1183, 1134, 105, 51};//item 3 prijs
int[] arrD =new int[] { 1192, 1287, 107, 52 };//totaal prijs
//the strings as will be made available within the lines, as words
string strA = "item 3";
string strB = "totaal:";
string strC = "2,99";
string strD = "8,97";
//make list to hold our fake linedata
List<int[]> ourLines = new List<int[]>();
ourLines.Add(arrA); ourLines.Add(arrB); ourLines.Add(arrC); ourLines.Add(arrD);
//following structure is observed
for(int region = 0; region < 3; region++){
//3 regions for each region process lines
foreach(int[] lineData in ourLines)
{
//get Y coordinates from boundingbox which is: lineData[1]
//keep int in memory and link* words in the corresponding line to it.
//put int and its words in array outside region loop.
//repeat this a couple of 100 times
for (int words = 0; words < 180; words++)
{
//do stuff with words
}
}
}
//here i need a list with the Y coordinates (for example 1141, 1296,1134, 1287)
//cross reference all Y coordinates with eachother
//when they fall withing a deviation of 10 with another
//then build string with combined text
//search in text for words resembing 'total' and check if there is an approperiate monetary value.
//the above would link the values of arrays A + C and B + D.
//which causes the corresponding results to be; 'item 3 2.99' and 'totaal 8.97'
//currently the arrays A+B are returned in region one and C+D in region two. This also varies from image to image.
//necessary because vision OCR api sometimes decides that lists of productname + price do belong in 2 different regions
//same for totals, and this causes incorrect responses when reading the data. (like thinking the amount of products == the price
//so if you bought 3 items the ocr will think the total price is 3$. instead of 8.97 (when each item is 2.99)
//note, values are monetary and culture independant. so this can mean the values can be in xx.xx or xx,xx
//* with link i mean either make a tuple/list/struct/keyvaluepair/dictionary/or preferably something more approperiate.
// this code will be executed on android and iOS devices.. something lightweight is preferred.
}
}
From a SDK I get images that have the pixel format BGR packed, i.e. BGRBGRBGR. For another application, I need to convert this format to RGB planar RRRGGGBBB.
I am using C# .NET 4.5 32bit and the data is in byte arrays which have the same size.
Right now I am iterating through the array source and assigning the BGR values to there appropriate places in the target array, but that takes too long (180ms for a 1,3megapixel image). The processor the code runs at has access to MMX, SSE, SSE2, SSE3, SSSE3.
Is there a way to speed up the conversion?
edit: Here is the conversion I am using:
// the array with the BGRBGRBGR pixel data
byte[] source;
// the array with the RRRGGGBBB pixel data
byte[] result;
// the amount of pixels in one channel, width*height
int imageSize;
for (int i = 0; i < source.Length; i += 3)
{
result[i/3] = source[i + 2]; // R
result[i/3 + imageSize] = source[i + 1]; // G
result[i/3 + imageSize * 2] = source[i]; // B
}
edit: I tried splitting the access to the source array into three loops, one for each channel, but it didn't really help. So I'm open to suggestions.
for (int i = 0; i < source.Length; i += 3)
{
result[i/3] = source[i + 2]; // R
}
for (int i = 0; i < source.Length; i += 3)
{
result[i/3 + imageSize] = source[i + 1]; // G
}
for (int i = 0; i < source.Length; i += 3)
{
result[i/3 + imageSize * 2] = source[i]; // B
}
Bump because the question is still unanswered. Any advise is very appreciated!
You could try to use SSE3's PSHUFB - Packed Shuffle Bytes instruction. Make sure you are using aligned memory read/writes. You will have to do something tricky to deal with the last dangling B value in each XMMWORD-sized block. Might be tough to get it right but should be a huge speedup. You could also look for library code. I'm guessing you will need to make a C or C++ DLL and use P/Invoke, but maybe there is a way to use SSE instructions from C# that I don't know about.
edit - this question is for a slightly different problem, ARGB to BGR, but the techniques used are similar to what you need.
Basler company has a SDK for their cameras, called Basler Pylon, working in Windows and Linux.
This SDK has APIs for C++, C# and more.
It has an image conversion class PixelDataConverter, which seems to be what you need.
I am attempting to create a classifier/predictor using SURF and a Naive Bayesian. I am pretty much following the technique from "Visual Categorization with Bags of Keypoints" by Dance, Csurka... I am using SURF instead of SIFT.
My results are pretty horrendous and I am not sure where my error lies. I am using 20 car samples (ham) and 20 motorcycle samples(spam) from the CalTec set. I suspect it is in the way I am creating my vocabulary. What I can see is that the EMGU/OpenCV kmeans2 classifier is returning different results given the same SURF descriptor input. That makes me suspicious. Here is my code so far.
public Matrix<float> Extract<TColor, TDepth>(Image<TColor, TDepth> image)
where TColor : struct, Emgu.CV.IColor
where TDepth : new()
{
ImageFeature[] modelDescriptors;
using (var imgGray = image.Convert<Gray, byte>())
{
var modelKeyPoints = surfCPU.DetectKeyPoints(imgGray, null);
//the surf descriptor is a size 64 vector describing the intensity pattern surrounding
//the corresponding modelKeyPoint
modelDescriptors = surfCPU.ComputeDescriptors(imgGray, null, modelKeyPoints);
}
var samples = new Matrix<float>(modelDescriptors.Length, DESCRIPTOR_COUNT);//SURF Descriptors have 64 samples
for (int k = 0; k < modelDescriptors.Length; k++)
{
for (int i = 0; i < modelDescriptors[k].Descriptor.Length; i++)
{
samples.Data[k, i] = modelDescriptors[k].Descriptor[i];
}
}
//group descriptors into clusters using K-means to form the feature vectors
//create "vocabulary" based on square-error partitioning K-means
var centers = new Matrix<float>(CLUSTER_COUNT, samples.Cols, 1);
var term = new MCvTermCriteria();
var labelVector = new Matrix<int>(modelDescriptors.Length, 1);
var cluster = CvInvoke.cvKMeans2(samples, CLUSTER_COUNT, labelVector, term, 3, IntPtr.Zero, 0, centers, IntPtr.Zero);
//this is the quantized feature vector as described in Dance, Csurska Bag of Keypoints (2004)
var keyPoints = new Matrix<float>(1, CLUSTER_COUNT);
//quantize the vector into a feature vector
//making a histogram of the result counts
for (int i = 0; i < labelVector.Rows; i++)
{
var value = labelVector.Data[i, 0];
keyPoints.Data[0, value]++;
}
//normalize the histogram since it will have different amounts of points
keyPoints = keyPoints / keyPoints.Norm;
return keyPoints;
}
The output gets fed into NormalBayesClassifier. This is how I train it.
Parallel.For(0, hamCount, i =>
{
using (var img = new Image<Gray, byte>(_hams[i].FullName))
{
var features = _extractor.Extract(img);
features.CopyTo(trainingData.GetRow(i));
trainingClass.Data[i, 0] = 1;
}
});
Parallel.For(0, spamCount, j =>
{
using (var img = new Image<Gray, byte>(_spams[j].FullName))
{
var features = img.ClassifyFeatures(_extractor);
features.CopyTo(trainingData.GetRow(j));
trainingClass.Data[j + hamCount, 0] = 0;
}
});
using (var classifier = new NormalBayesClassifier())
{
if (classifier.Train(trainingData, trainingClass, null, null, false))
{
classifier.Save(_statModelFilePath);
}
}
When I call Predict using the NormalBayesClassifier it returns 1(match) for all of the training samples...ham and spam.
Any help would be greatly appreciated.
Edit.
One other note is that I have chosen CLUSTER_COUNT from 5 to 500 all with the same result.
The problem was more conceptual than technical. I did not understand that the K Means cluster was building the vocabulary for the "entire" data set. The way to do it correctly is to give the CvInvoke.cvKMeans2 call a training matrix containing all of the features for every image. I was building the vocabulary each time based on a single image.
My final solution involved pulling the SURF code into its own method and running that on each ham and spam image. I then used the massive result set to build a training matrix and gave that to the CvInvoke.cvKMeans2 method. It took quite a long time to finish the training. I have about 3000 images total.
My results were better. The prediction rate was 100% accurate with the training data. My problem now is that I am likely suffering from over fitting because its prediction rate is still poor for non-training data. I will play around with the hessian threshold in the SURF algorithm as well as the cluster count to see if I can minimize the over fitting.